Home
Welcome to the website of Prof. Kim-Anh Lê Cao’s lab group at Melbourne Integrative Genomics and the School of Mathematics and Statistics, University of Melbourne!
Our lab focuses on the development of computational methods, their applications in areas informed by biology, and the training of the new generation of computational biologists and data analysts. Our area of expertise is in the integration of biological ‘omics data (transcriptomics, proteomics, metabolomics etc., as well as microbiome, metagenomics, single cell transcriptomics and multi-omics) with multivariate and dimension reduction methodologies, selection of features of biomarkers in large biological data sets and R software development. Our group provides critical collaborative expertise to biologists, bioinformaticians, statisticians and clinicians and welcomes budding data analysts.
Our aim is to broadly enable scientific progress well beyond statistical development itself. We value creative thinking in statistical methodological development to address critical challenges arising from high throughput biological research.
More news about (workshops and updates): www.mixOmics.org
Lab head: Prof Kim-Anh Lê Cao
Director of research (Maths & Stats)
Melbourne Integrative Genomics (MIG) & School of Mathematics and Statistics
Building 184 ground floor | University of Melbourne | Parkville VIC 3010
@: kimanh.lecao[ at ]unimelb.edu.au | twitter: @mixOmics_team | Ph: +61 3 8344 3971
News
Click on this link to read all our News
-
Metaphor: A workflow for streamlined assembly and binning of metagenomes
Genome-resolved metagenomics techniques aim to recover genomes from high-throughput sequencing data, and have led to unprecedented insight into microbial diversity, ecology, and evolution. However, data pipelines for for assembling and binning metagenomes are inherently complex, have high computing cost, use heterogeneous data sources, have dozens of customizable parameters, and depend on several specialized bioinformatics software. PhD student Vinícius Salazar introduces Metaphor, building on …
August 2, 2023 microbiome, Man... -
mixOmics shortlisted in prestigious Eureka Prize from the Australian Museum
[from the University of Melbourne News room] Software developed by the Faculty of Science’s Professor Kim-Anh Lê Cao (School of Mathematics and Statistics) has been shortlisted for the Australian Museum’s 2023 Eureka Prizes. The annual Eureka Prizes are Australia’s most comprehensive national science awards, honouring excellence across the areas of research and innovation, leadership, science engagement, and school science. Professor Lê Cao’s software, …
July 27, 2023 mixOmics, video... -
Self-paced online mixOmics course Self-paced online course May 22 – July 7 2023
Registrations for the second iteration of our online mixOmics course are now open! More details here. Feedback from our last iteration can be found here. If you've missed out, our next iteration will probably be early 2024! You can fill up this short survey to be notified when we open our next course.
April 27, 2023 workshop, mixOm... -
Article: PLSDA-batch: a multivariate framework to correct for batch effects in microbiome data
It's been a long journey! The method developed by my former PhD student Eva Yiwen Wang is out and ready to be used! Yiwen Wang, Kim-Anh Lê Cao, PLSDA-batch: a multivariate framework to correct for batch effects in microbiome data, Briefings in Bioinformatics, 2023;, bbac622, https://doi.org/10.1093/bib/bbac622 Key Points We developed a set of three multivariate and non-parametric batch effect correction methods for microbiome data to …
January 20, 2023 microbiome, mic... -
Self-paced online mixOmics course Oct 31st – Nov 27 2022
Registrations for the second iteration of our online mixOmics course are now open! More details here.
August 14, 2022 workshop, mixOm... -
Review article: Statistical challenges in longitudinal microbiome data analysis
With Saritha Kodikara and Susan Ellul, we have published our latest review and benchmarked existing methods for longitudinal microbiome analysis on simulated data and a case study. The methods we have identified can be categorised into differential abundance analysis, clustering and network modelling. The key points from our review are: Longitudinal microbiome studies are conducted to understand the temporal variations of …
July 27, 2022 microbiome, rev... -
Sincast: a computational framework to predict cell identities in single-cell transcriptomes using bulk atlases as references
Following up from our preliminary work on how to query cell types from transcriptomics bulk studies using bulk atlases, our talented student Yidi Deng has gone ahead to establish a new framework to query single cell transcriptomics studies based on transcriptomics bulk atlases! Our article is now published: Yidi Deng, Jarny Choi*, Kim-Anh Lê Cao*, Sincast: a computational framework to predict cell …
April 3, 2022 single cell, News -
Outreach: visit at Melbourne Girls’ College
With some the outreach team from the School of Mathematics and Statistics, I visited the Melbourne Girls’ College, a highly selective girls state school, but very few (to none) of these student continue in the mathematics in our University. I gave a presentation about my work and career. We the ran an activity using gene expression data in a case …
February 28, 2022 superstars of S... -
Book: Multivariate Data Integration Using R: Methods and Applications with the mixOmics package
We are excited to announce that our book is out, along with several case studies and R scripts available online. Check out this page. It’s been a very (very) long term project, and a great collaboration with Zoe Welham whose dedication and patience helped shape this project into a readable whole! A huge thank you to Al Abadi, who tirelessly helped updating the package …
November 8, 2021 mixOmics, News -
Community-wide hackathons to identify central themes in single-cell multi-omics
Last year we ran an online event as part of Banff International Research Station workshops. We designed and analysed three single cell multi-omics hackathons with another 30 participants. Our highlights and findings are highlighted in this commentary In addition, our workshop was also featured in this article in Nature Technology magazine. Some of our key messages: 1. Algorithms developed for bulk multi-omics data do not …
August 6, 2021 single cell, vi... -
Superstars of STEM
Kim-Anh was selected as Australia’s newest Superstars of STEM – 60 brilliant women in science, technology, engineering and mathematics who want to step into the spotlight as experts in their fields. The superstars of STEM is a 2-year national program to give women in STEM stronger skills and confidence to have expert commentary roles in the media. Five University of Melbourne …
December 4, 2020 News -
Postdoc position available
[this position has now been filled] A new position is available in our lab with the following description. Title: research fellow, computational genomics and statistics Position summary: The School of Mathematics and Statistics (https://ms.unimelb.edu.au), and its partner Melbourne Integrative Genomics (MIG, https://research.unimelb.edu.au/integrative-genomics) are seeking a qualified and enthusiastic Research Fellow to lead cutting-edge research in method development, implementation and analysis of biological data. The …
September 22, 2020 omics integrati...
Click on this link to read all our News
Our lab specialises computational methods and software developments, as well as the application of our methods and tools to biological data sets generated by our collaborators.
Data integration methods using multivariate projection-based methodologies
Our dimension reduction methods are based on the Projection to Latent Structures algorithm (PLS, a term we prefer to Partial Least Squares regression, Wold et al. 2001) that are combined with LASSO regularization to identify important biological features or biomarkers in large-scale biological data sets. Our latest frameworks include DIABLO (Singh et al. 2019) to integration multiple data sets measured on the same N samples (N-integration); MINT (Rohart et al. 2017a) to integrate independent studies measured on the same P variables / genes (P-integration) and mixMC (Lê Cao et al. 2016) for the multivariate analysis of microbial communities, timeOmics (Bodein et al. 2020) to integrate microbiome and ‘omics time course data.
We are interested in developing new multivariate methodologies to
- integrate multi-omics single cell data on matching cells (multiome ATAC-seq, CITE-seq, scNMT-seq)
- integration of multi-omics time course data
- integrate multi-omics and microbiome data
mixOmics R toolkit package (www.mixOmics.org)
mixOmics is one of the few R package dedicated to the integration of multiple ‘omics data (19 novel methodologies implemented so far, amongst which 13 were developed by our lab) and with an increasing uptake from the research community. The package is in the top 5% downloads in Bioconductor ( ~ 80K downloads / year). Check our our recent publication (Rohart et al. 2017b) and and 50-min webinar overview about this project. The mixOmics team run multiple day workshops for an introduction to multivariate projection-based methods for data integration using mixOmics, see our website www.mixOmics.org for news, tutorials, online courses. We continuously improve our software to support our growing userbase.
Multivariate methods for microbiome studies
There are major statistical and computational challenges in analysing microbial communities that currently hinder the potential of microbiome research to substantially advance biomedical understanding. We are currently expanding mixMC to better characterise and understand important microbiome-host interactions. Some of our methods developments aim at addressing batch effects in microbiome experiments and analyse scarce temporal sampling in time course studies.
We analyse microbiome datasets from our collaborators for a wide range of studies, including investigating the role of gut and oral microbiome in spondyloarthropathy diseases, the development of intestinal or salivary microbiota in toddlers and infants, investigating the gut-brain crosstalk in Huntington’s disease, anaerobic digestion in fermenters.

Current members
Staff members
- Jiadong Mao, postdoctoral fellow
- Saritha Kodikara, postdoctoral fellow
Higher Degree Research students
- Xiaochen Zhang, Msc Computational statistics
- Yidi Deng, PhD candidate, UoM, in co-supervision with Dr Jarny Choi (Centre for Stem Cell Systems)
- Vinícius Salazar, PhD candidate, UoM, in co-supervision A/Prof Heroen Verbruggen and Dr Vanessa Rossetto Marcelino (Hudson Medical research institute)
- Daniel Rawlinson, PhD candidate, in co-supervision with Prof Lachlan Coin, Doherty Institute
- Guannan Yang, PhD candidate, UoM, in co-supervision with Prof Eva Dimitriadis and Dr Elle Menkhorst, Department of Obstetrics and Gynaecology
Visitors (see our list of past visitors here)
- Yi-Wen Hsiao, PhD candidate, visitor from University of Taiwan (1 year stay)
- Marko Terzin, PhD candidate, James Cook University, 3 months 2023
We welcome any students and staff who are interested in statistical analysis of omics data and wish to attend our fortnight group meetings!
Alumni staff
- Max Bladen, now research assistant in genomics
- Daniel Tien Phung, now PhD student
- Al J Abadi, now Senior Data Scientist
- Katherine Lange, Murdoch Research Institute
- Aleksandar Dakic
- Zitong Li, now Senior scientist at CSIRO
- Malathi Imiyage Dona, postdoctoral fellow at Baker institute
- Florian Rohart , now Senior Strategic Insights Lead
- Nicholas Matigian – now biostatistician at QFAB Bioinformatics
- Benoit Gautier – now teacher in mathematics in France.
Alumni students (PhD)
- Eva Yiwen Wang, ‘Statistical and Computational Methods for Microbiome Data Analysis‘, UoM, now assistant professor at Agricultural Genomics Institute at Shenzhen
- Aimee Hanson, ‘Lymphocyte receptors: Genomic structure and role in immune- mediated arthritis’ with main supervisor Prof Matt Brown (QUT) and Diamantina Institute, Faculty of Medicine, University of Queensland, now research fellow at King’s college London.
- Farah Syeda Zahir, ‘Obesity paradox: Exploring the relationship between adiposity and mortality in persons with Cardiovascular Disease and/or Type 2 Diabetes Mellitus’, co-supervised with Dr Ahmed Medi (Diamantina Institute), School of public Health, University of Queensland, now biostatistician at QFAB
- Jasmin Straube ‘Development of statistical tools for integrating time course ‘omics’ data’ with co-supervisors Dr Emma Huang and Dr Anne Bernard, QFAB and University of Queensland, now research fellow at QIMR
- Ralph Patrick ‘Molecular interaction motifs in a system-wide network context: Computationally charting transient kinase-substrate phosphorylation events’ with main supervisor A/Prof Mikael Boden, University of Queensland, now research fellow at Victor Chang institute
- Amrit Singh ‘Blood biomarker panels of the late phase asthmatic response’ with main supervisor Prof Scott Tebbutt, University of British Columbia, Vancouver, Canada, now Assistant Professors at University of British Columbia
- Chao Liu ‘Computational analysis of DNA repair pathways in breast cancer’ with main supervisor Prof Mark Ragan, Institute for Molecular Bioscience, University of Queensland
Alumni students (Honours and Msc)
- Isaac Virshup, MPhil ‘Finding patterns of biologically meaningful transcript expression by examining heterogenous sets of cells’, UoM with main supervisor Prof Christine Wells
- Qinwen Chen, Msc Statistics by coursework, UoM
- Guannan Yang, Honours, co-supervision with Prof Eva Dimitriadis and Dr Ellen Menkhorst (Department of Obstetrics & Gynaecology)
- Sibi Xue, Msc Statistics by coursework, UoM
- Yinghua Shen, Msc Statistics by coursework, UoM
- Mengqi (Chi-Chi) Hu, Msc Bioinformatics by coursework, UoM
- Alana Butler, Master of Science (Bioinformatics), UoM, now research assistant at Monash University.
- Nicholas d’Arcy, Nicholas Mueller, University of Queensland
- Solange Pruilh, Zoe Welham, Vanessa Lakis, Priscilla Montfalet, Thom Cuddihy, Mourad Larbi, Jeff Coquery, Pierre Monget who did a research placement in our lab.
Conventional publications
Publications are listed here. We are fervent advocates of open science and open data, with some manuscripts hosted in bioRxiv, and all R codes and scripts on our mixOmics page and www.github/mixOmicsTeam .
Book
Lê Cao K-A. and Welham Z (2021). Multivariate Data Integration Using R: Methods and Applications with the mixOmics package. CRC Chapman & Hall. 306 pages, 14 chapters.
Other publications (refereed by editorial board)
- Huang BE, Clifford D and Lê Cao K-A (2014). The surprising benefit of passive-aggressive behaviour at Christmas parties: being crowned king of the crackers. Medical Journal of Australia 201(11):694-6 (Christmas issue, awarded first prize, radio interview from ABC Darwin, mentioned in the podcast from Two Shrink Pod (episode 21, Dec 2017).
- Clifford D, Lê Cao K-A and Huang BE (2014). The statistician’s guide to a cracking good Christmas party. Significance 11(5):44-7 (Christmas issue, doi: 10.1111/j.1740- 9713.2014.00784.x).
- Lê Cao K-A (2019). Heroines of mathematics. The Pursuit, an engagement journal published at UoM (8.4K views, August 2019, co-written, leading to an interview at ABC radio with Myf Warhust)
- Lê Cao K-A (2020). Get to know your microbiome better. The Pursuit (2.5K views, Jan 2020).
- Lê Cao, K-A., Abadi, A.J., Davis-Marcisak, E.F. et al.(2021) Community-wide hackathons to identify central themes in single-cell multi-omics. Genome Biol 22,
Awards and fellowships
2020 – 2022 Superstars of STEM (Science Technology Australia) to gain advanced communication skills and increase the public visibility of women in STEM.
2019 Homeward Bound year-long leadership program for women with a background in STEMM, culminating to a voyage in Antarctica.
2019 The University of Melbourne Dean’s Award for Excellence in Research (mid-career)
2019 Georgina Sweet Award created by Prof L Tilley (ARC Laureate) to promote female scientists with excellence in Quantitative Biomedical Science (up to 3 awards / year)
2019 – 2022 Career Development Fellowship (CDF2) from the National Health and Medical Council Research (NHMRC) ‘Microbiome biomarkers of human disease: novel computational methods to facilitate therapeutic developments’, $483K.
2019 Moran medal from the Australian Academy of Science for contribution in the past 10 years in Statistical sciences in Australia (early-career, biennial)
2015 – 2019 Career Development Fellowship (CDF1) from the National Health and Medical Council Research (NHMRC) ‘Development of statistical methodologies and application to clinical cancer studies’, $419K.
2009 Laurent-Duhamel triennial prize from the French Statistical Society for PhD thesis in Applied Statistics, Bordeaux, France.
Current funding (UoM)
2021 – 2026 MRFF Preventive and Public Health Research Initiative. Infant2Child: Targeting common risk factors to optimise nutrition and reduce childhood dental caries in the first 2000 days. Dr M Silva, A/Prof Rachel Laws, Dr Margarita Moreno-Betancur, Prof Stuart Dashper, Dr Miaobing Zheng, A/Prof Martin Hall, Prof David Burgner, Dr Ankur Singh, Associate Professor Nicky Kilpatrick, A/Prof K-A Lê Cao, Dr L Petrick. Total AUD$1,267,826.21 (150k in 2024-2026)
2021 – 2023 Juvenile Diabetes Research Foundation Australia, ENDIA Early-Mid Career Science Accelerator Awards Influence of early life and maternal host-microbiota interactions on type 1 diabetes risk. A/Prof E Hamilton Williams, Prof M Knip, A/Prof M Hill, A/Prof K-A Lê Cao , Prof Harrison. AUD$ 442,502
2021 – 2026 National Institute of Dental and Craniofacial Research/National Institutes of Health R01DE029838-01 Reconstructing early life environmental exposures using tooth biomarkers and their influence on the trajectory of the oral microbiome and oral health in childhood. Dr C Austin, Dr C Adler, Prof M Arora, Dr M Bockman, Dr P Curtin, A/Prof Hughes, A/Prof K-A Lê Cao, Dr L Petrick. Role: PI. Total USD$ 3,655,958. (~ USD$166.426 from Y3-5). Icahn School of Medicine at Mount Sinai, University of Adelaide, Sydney and Melbourne.
2020 – 2023 ARC Discovery Project DP200102903. Empirical and computational solutions for multi-omics single-cell assays. A/Prof K-A Lê Cao, Dr Heather Lee (UoN), A/Prof Matt Ritchie (WEHI) and A/Prof. Stephanie Bougeard (ANSES). Role: CIA. $650K
Patents
The application of our methods and software has directly resulted in four biomedical patents.
- Gandhi M, Keane C, Lê Cao K-A, Vari F (2015). A method of assessing prognosis of lymphoma. WO/2016/134416. Priority 23/02/2016
- Thomas R, Mehdi A, Lê Cao K-A (2014). Kits and methods for the diagnosis, treatment, prevention and monitoring of diabetes. PCT/AU2014/050415. Priority 18/06/2015
- Hill M, Shah A, Lê Cao K-A (2014). Blood Test for Throat Cancer. WO/2016/077881. Priority 17/11/2015
- Musso O, Desert R, Rohart F, Lê Cao K-A. Method for predicting the survival time of a patient suffering from hepatocellular Carcinoma. EP17305436.2. Priority 12/04/2017
Past funding (UoM and University of Queensland)
2018 – 2021 NHMRC Project Grant, GNT1142456. Enhancing host defence mechanisms in severe bacterial infections. Dr A Blumenthal, Prof B Venkatesh, Prof D Evans, Dr K-A Lê Cao, Prof G Ulett, A/Prof J Cohen. Role: CID. $837K
2018 – 2021 NHMRC Project Grant. GNT1144941. Understanding how azithromycin prevents exacerbations in severe asthma. Prof J Upham, Prof J Simpson, Dr K Baines, Dr K- A Lê Cao. Role: CID. $698K
2018 – 2019 Silicon Valley Community Foundation, HCA2-A-1708-02277, Multivariate computational methods for data integration of single cell assays. Role: CIA. $132K
2018 – 2019 ARC Special Research Initiative in Stem cells Centre of Excellence, Stem Cells Australia led by Prof M Little (UoM). Role: co-CI. $3M, 1 research fellow in Lê Cao group.
2018 UoM Computational Biology Research Initiative seed funding. Towards the understanding of gut-brain crosstalk in Huntington’s disease. Role: CIA. $20K.
2016 Translational Research Institute SPORE grant, Obesity-induced Barrett’s oesophagus and associated cancer: mechanisms and diagnostic tools. A/Prof M. Hill, Dr A. Barbour, Dr K-A. Lê Cao (CIC). $100K
2016 Translational Research Institute SPORE grant, Towards biomarkers for patient stratification in sepsis, Dr A. Blumenthal, Prof B. Venkatesh, A/Prof J Cohen, Dr K-A. Lê Cao, Dr D. Vagenas, Prof I. Frazer (CID). $80K
2014 – 2015 The Juvenile Diabetes Research Foundation (JDRF), 2-SRA-2015-306-Q-R, A genetic link between gut microbial flora and T1D susceptibility. Dr D. Zipris (University of Colorado) and co-CI from UQDI: Dr E. Hamilton-Williams, Dr J. Mullaney, A/Prof M. Hill, Dr K-A. Lê Cao (PI). $500K
2014 The Juvenile Diabetes Research Foundation (JDRF), 1-PNF-2014-153-A-V, Risk of diabetes progression in at-risk subjects with metabolic and inflammatory signatures. Prof R. Thomas (UQDI), K-A. Lê Cao et al. (PI). $110K
2014 UQ Major Equipment and Infrastructure, 2014000102, High throughput gene expression of patient samples via the Nanostring nCounter system. Prof M. Gandhi and 9 co-CI from UQDI, K-A. Lê Cao (CIJ). $169K
2014 – 2016 NHMRC Project Grants Funding, APP1058993, Blood biomarkers in Hodgkin Lymphoma. Prof M. Gandhi, Prof M. Fulham, A/Prof J. Trotman, Dr K-A. Lê Cao, Dr L. Berkahn. (CID). $513K
2013 – 2015 ARC Discovery Project, DP130100777. The Stemformatics gene expression compendium: development of multivariate statistical approaches for cross platform analyses. A/Prof C. Wells, Dr K-A. Lê Cao (CIB). $269K, shared postdoctoral fellow.
Our lab aims to inspire younger generations of budding statisticians, data analysts and computational biologists to advance the field of computational biostatistics.
All our members use GitHub and thrive for reproducible research, see:
- https://github.com/ajabadi
- https://github.com/EvaYiwenWang
- https://github.com/abodein/timeOmics
- https://github.com/ivirshup
- https://github.com/mixOmicsTeam
- https://github.com/SarithaKodikara
- https://github.com/meiosis97/Sincast
Current
2020 – Handbook about multivariate projection-based methods and how to apply them using mixOmics to integrate biological data.
2020 – A 6-week online course ‘mixOmics R Essentials for Biological Data Integration’ that will be run once or twice a year (first iteration was in Nov 2020).
2019 – We have developed a 16-week online course opened for University of Melbourne students called ‘Data fundamentals’ with Dr Sue Finch (Statistical Consulting Centre, School of Mathematics and Statistics). The course is opened every trimester. Have a look at this page if you wish to register, it is a fun course to learn how to work with data.
Since 2014 – We teach specialised workshops to introduce key concepts in multivariate statistics, with applications using the R software mixOmics. Our mixOmics web page provides numerous tutorials to apply the different multivariate integrative methods implemented in mixOmics.
Past
We taught introductory statistics ‘Statistics for frightened bioresearchers’ lecture materials can be found here.
Below is a list of opportunities in our lab, including undergraduate and postgraduate research projects and scientific visits.
Positions
We are looking for self-motivated candidates in the field of computational statistics applied to high-throughput biological data, as well as data analysts and software developers. Contact us!
Students
We welcome undergraduate, hons/Msc and PhD students willing to be part of the group to apply our methods to specific biological problems, or develop innovative computational methods at the forefront of ‘omics and microbiome data integration. There are plenty of projects to choose from our research themes and cross-discipline projects. Some are listed in here.
Visiting scientists
We welcome wet-lab researchers and assist them in acquiring the necessary skillsets to analyse their own data with our tools, and dry-lab researchers to collaborate on our many exciting projects.
Dr Olivier Chapleur and Ms Laetitia Cardona stayed for 10 and 3 weeks with us in 2017. Here is a brief description of the work they undertook with us, and their feedback.
Dr Sébastien Déjean stayed for 5 weeks with us in July 2018 and helped run a mixOmics workshop.
Prof Malu Calle Rosingana visited us for 4 weeks in January 2019.
Stijn Hawinkel, PhD candidate in Prof Olivier Thas (Ghent University) visited us for 3 months (March – May 2019).
Dr Olivier Chapleur and Ms Laetitia Cardona were back for 4 and 6 weeks from April 2019. They gave us a hand for our upcoming mixOmics workshop focusing on microbiome data analysis.
Attila Csala, PhD candidate in Prof Aeilko Zwinderman (University of Amsterdam) visited us for 4 months (Nov 2019 – March 2020)
Pedro Salguero García, PhD candidate in Prof Ana Conesa and Dr. Sonia Tarazona from the Polytechnical University of Valencia, Spain came for 3 months in 2022.
Quentin Le-Graverand, PhD candidate from INRAE Toulouse came for 3 months in 2022.
Prof Ivy Chung, University of Malaya, 3 months 2023.
@: kimanh.lecao[ at ]unimelb.edu.au
Ph: +61 3 8344 3971
We are located at:
Melbourne Integrative Genomics | Old microbiology building 184 ground floor | The University of Melbourne
Main entrance is through Royal Parade, approximately at 30 Royal Parade, next to the Kenneth Meyer building (Tram Route 19, Stop no. 11 from the city centre).
Google map pin (front entrance): https://goo.gl/maps/te888rSFeyc6LqgV6
There is a phone in the reception area, with contact numbers. Give us a buzz then.
Back entrance (underneath the stairs): https://goo.gl/maps/2TPrvbsaxGkrxBKC8