Our lab focuses on the development of computational methods, their applications in areas informed by biology, and the training of the new generation of computational biologists and data analysts. Our area of expertise is in the integration of biological ‘omics data (transcriptomics, proteomics, metabolomics etc., as well as microbiome metagenomics and lastly single cell transcriptomics) with multivariate and dimension reduction methodologies, selection of features of biomarkers in large biological data sets and R software development. Our group provides critical collaborative expertise to biologists, bioinformaticians, statisticians and clinicians and welcomes budding data analysts.
Our aim is to broadly enable scientific progress well beyond statistical development itself. We value creative thinking in statistical methodological development to address critical challenges arising from high throughput biological research.
More news about (workshops and updates): www.mixOmics.org
Lab head: Dr Kim-Anh Lê Cao
Snr Lecturer, NHMRC Career Development Fellow
Melbourne Integrative Genomics (MIG) & School of Mathematics and Statistics
Building 184 ground floor | University of Melbourne | Parkville VIC 3010
@: kimanh.lecao[ at ]unimelb.edu.au | twitter: mixOmics_team | Ph: +61 3 8344 3971
Click on this link to read all our News
Kim-Anh Lê Cao won the Georgina Sweet Award 2019!
Dr Kim-Anh Lê Cao has been awarded the Georgina Sweet Award to promote and support female scientists who demonstrate excellence in the area of Quantitative Biomedical Science. Dr Lê Cao is a senior Lecturer in Statistical Genomics, in the school of Mathematics and Statistics. She is also a member of the Centre for Stem Cell Systems Her research lab at Melbourne Integrative Genomics focuses on …October 18, 2019 award, News
New framework: timeOmics published
In collaboration with Antoine Bodein and Arnaud Droit from Université de Laval (bioinformatics) and Olivier Chapleur (IRSTEA, France, microbiology), we have developed the timeOmics framework to handle time course and longitudinal datasets. The paper is now published, along with our gitHub page repo in progress. We have applied our framework in two microbiome related studies. The gut infant microbiome development study investigates …September 12, 2019 News
04 SEP 2019: Multi-omics statistical integration with mixOmics The extensive mixOmics R toolkit is dedicated to the integrative analysis of `omics’ data to help researchers make sense of biological big data. Dr Kim-Anh Lê Cao is an expert in multivariate statistical methods and develops novel methods for ‘omics data integration. Dr Kim-Anh Lê Cao graduated from her PhD in 2008 at the …September 2, 2019 Seminar, mixOmi...
UoM Women in Maths Day: 29th August 2019 + Update (pursuit article and radio interview)
The day will celebrate Women of Mathematics through a range of activities, including a networking lunch, a panel discussion, research talks, and an official opening of the exhibit "Women of Mathematics throughout Europe" at the University of Melbourne. I will be presenting Florence Nightingale, a pioneer who has made important contributions in graphical representations of statistics, amongst other things! More details about …August 20, 2019 Outreach, News
VIC Biostat seminar
Multivariate microbiome data analysis Thursday, 22 August 2019 Time: 9.30-10.30am DEPM Conference Room 1, 553 St Kilda Road Melbourne, Australia Our recent breakthroughs and advances in culture independent techniques (whole genome shotgun metagenomics, 16S rRNA amplicon sequencing) have dramatically changed the way we can examine microbial communities. But does the hype of microbiome outweigh the potential of our understanding of this ‘second genome’? There are …August 19, 2019 Seminar, microb...
WEHI special seminar series
Technological improvements have allowed for the collection of data from di erent molecular compartments resulting in multiple omics data from the same set of biospecimens. We propose to adopt a holistic approach to glean molecular insights of a biological system. Integrating data include numerous challenges – data are complex and large, each with few samples and many molecules, and generated using …July 29, 2019 Seminar, News
Winter School in Mathematical & Computational Biology 1-5 July 2019 Brisbane
We presented at the opening session 'Next generation sequencing & bioinformatics' to introduce our latest cool method DIABLO. Technological improvements have allowed for the collection of data from different molecular compartments (e.g. gene expression, protein abundance) resulting in multiple omics data from the same set of biospecimens or individuals (e.g. transcriptomics, proteomics). We propose to adopt a systems biology holistic approach …July 1, 2019 Seminar, News
The powerful mind public event, June 26, UoM
As part of my homeward bound leadership program, Kim-Anh Lê Cao is inviting you to join her crowdfunding public event ‘the powerful mind’ this Wednesday. Pushing our limits is key to our success, but how can we use our inner resources to our advantage? This public event hosted at the University of Melbourne will explore the role of the mind for …June 25, 2019 News
CellBench: a new package helping researchers to select the best tool for interpreting single-cell datasets.
Much is still unknown about the trillions of cells in the human body. In the quest to better understand cells and the role they play in health and disease, a technique called single-cell sequencing has become a hot research field. Over the past five years there has been an explosion of new analysis tools for interpreting single-cell data. This has left …May 28, 2019 single cell, News
Empowering researchers to understand microbiome data for frontier projects
The recent mixOmics workshop empowered Australian microbial ecologist researchers with necessary analytical tools to analyse their complex data and enable them to explore and accelerate discoveries towards these new frontier technologies in sustainability. Dr Kim-Anh Lê Cao, senior lecturer in the School of Mathematics and Statistics, and French collaborators Dr Olivier Chapleur and Ms Laetitia Cardona, have successfully prepared and run …May 3, 2019 workshop, News
Multi-omics data integration: method and ground-breaking neonate study
Lê Cao team and collaborators from University of British Columbia (Vancouver, Canada) have published their first method to integrate multiple omics data from the same set of biospecimens or individuals (e.g. transcriptomics, proteomics). Their method adopts a systems biology holistic approach by statistically integrating data from multiple biological compartments. Such approach provides improved biological insights compared with traditional single omics analyses, as it …March 13, 2019 News
Moran Medal from the Australian Academy of Science
The Moran Medal in Statistical Sciences is awarded every two years by the Australian Academy of Science to recognize outstanding research by Australian scientists below 10 years post PhD in the fields of applied probability, biometrics, mathematical genetics, psychometrics, and statistics. Delighted to announce that both Kim-Anh Lê Cao and Stephen Leslie from Melbourne Integrative Genomics have been awarded 2019 Moran Medals …February 28, 2019 News
Click on this link to read all our News
Our lab specialises computational methods and software developments, as well as the application of our methods and tools to biological data sets generated by our collaborators.
Data integration methods using multivariate projection-based methodologies
Our dimension reduction methods are based on the Projection to Latent Structures algorithm (PLS, a term we prefer to Partial Least Squares regression, Wold et al. 2001) that are combined with LASSO regularization to identify important biological features or biomarkers in large-scale biological data sets. Our latest frameworks include DIABLO (Singh et al. 2019) to integration multiple data sets measured on the same N samples (N-integration); MINT (Rohart et al. 2017a) to integrate independent studies measured on the same P variables / genes (P-integration) and mixMC (Lê Cao et al. 2016) for the multivariate analysis of microbial communities, timeOmics (Bodein et al. 2019) to integrate microbiome and ‘omics time course data.
We are interested in developing new multivariate methodologies to
- integrate multi-omics single cell data (scNM&T-seq in particular)
- integration multi-omics time course data
- integrate genotype (SNP) data
mixOmics R toolkit package (www.mixOmics.org)
mixOmics is one of the few R package dedicated to the integration of multiple ‘omics data (19 novel methodologies implemented so far, amongst which 13 were developed by our lab) and with an increasing uptake from the research community. The package has been downloaded > 67K times in 2019. Programming developments are on-going for interactive web interfaces, and efficient programming for large-scale studies. Check our our recent publication (Rohart et al. 2017b) and and 50-min webinar overview about this project. The mixOmics team run multiple day workshops for an introduction to multivariate projection-based methods for data integration using mixOmics, see our website www.mixOmics.org for news and tutorials.
Multivariate methods for microbiome studies
There are major statistical and computational challenges in analysing microbial communities that currently hinder the potential of microbiome research to substantially advance biomedical understanding. We are currently expanding mixMC to better characterise and understand important microbiome-host interactions. Some of our methods developments aim at addressing batch effects in microbiome experiments and analyse scarce temporal sampling in time course studies.
We analyse microbiome datasets from our collaborators for a wide range of studies, including investigating the role of gut and oral microbiome in spondyloarthropathy diseases, the development of intestinal or salivary microbiota in toddlers and infants, investigating the gut-brain crosstalk in Huntington’s disease.
- Aleksandar Dakic – senior postdoctoral fellow, genomics data analyst, in collaboration with A/Prof Jess Mar (AIBN, UQ) and Prof Christine Wells (Centre for Stem Cell Systems)
- Al J Abadi – postdoctoral fellow, software developer and computational statistics
- [we are looking for a senior postdoctoral fellow in computational statistics and genomics! contact us for more details]
Higher Degree Research students
- Susie Ellul, PhD candidate ‘Multivariate methods for the analysis of longitudinal microbiome data‘, UoM
- Isaac Virshup, PhD candidate ‘Finding patterns of biologically meaningful transcript expression by examining heterogenous sets of cells’, UoM with main supervisor Prof Christine Wells
- Eva Yiwen Wang, PhD candidate ‘Development of multivariate and integrative statistical methods to improve microbiome research outputs‘, UoM
- Sibi Xue, Msc Statistics by coursework, UoM
- Yinghua Shen, Msc Statistics by coursework, UoM
We welcome any students and staff who are interested in statistical analysis of omics data and wish to attend our fortnight group meetings!
- Florian Rohart – now data analyst at NTI
- Nicholas Matigian – now data analyst at QFAB Bioinformatics
- Benoit Gautier – now teacher in mathematics in France.
Alumni students (PhD)
- Aimee Hanson, PhD candidate ‘Lymphocyte receptors: Genomic structure and role in immune- mediated arthritis’ with main supervisor Prof Matt Brown (QUT) and Diamantina Institute, Faculty of Medicine, University of Queensland
- Farah Syeda Zahir, PhD candidate ‘Obesity paradox: Exploring the relationship between adiposity and mortality in persons with Cardiovascular Disease and/or Type 2 Diabetes Mellitus’, co-supervised with Dr Ahmed Medi (Diamantina Institute), School of public Health, University of Queensland.
- Jasmin Straube ‘Development of statistical tools for integrating time course ‘omics’ data’ with co-supervisors Dr Emma Huang and Dr Anne Bernard, QFAB and University of Queensland.
- Ralph Patrick ‘Molecular interaction motifs in a system-wide network context: Computationally charting transient kinase-substrate phosphorylation events’ with main supervisor A/Prof Mikael Boden, University of Queensland.
- Amrit Singh ‘Blood biomarker panels of the late phase asthmatic response’ with main supervisor Prof Scott Tebbutt, University of British Columbia, Vancouver, Canada.
- Chao Liu ‘Computational analysis of DNA repair pathways in breast cancer’ with main supervisor Prof Mark Ragan, Institute for Molecular Bioscience, University of Queensland
Alumni students (Honours and Msc)
- Alana Butler, Master of Science (Bioinformatics), UoM, now research assistant at Monash University.
- Nicholas d’Arcy, Nicholas Mueller, University of Queensland
- Solange Pruilh, Zoe Welham, Vanessa Lakis, Priscilla Montfalet, Thom Cuddihy, Mourad Larbi, Jeff Coquery, Pierre Monget who did a research placement in our lab.
(refereed by editorial board)
Huang BE, Clifford D and Lê Cao K-A (2014). The surprising benefit of passive-aggressive behaviour at Christmas parties: being crowned king of the crackers. Medical Journal of Australia 201(11):694-6 (Christmas issue, awarded first prize, radio interview from ABC Darwin, mentioned in the podcast from Two Shrink Pod (episode 21, Dec 2017).
Clifford D, Lê Cao K-A and Huang BE (2014). The statistician’s guide to a cracking good Christmas party. Significance 11(5):44-7 (Christmas issue, doi: 10.1111/j.1740- 9713.2014.00784.x).
Awards and fellowships
2019 – 2022 Career Development Fellowship (CDF2) from the National Health and Medical Council Research (NHMRC) ‘Microbiome biomarkers of human disease: novel computational methods to facilitate therapeutic developments’, $483K.
2019 Moran medal from the Australian Academy of Science for contribution in the past 10 years in Statistical sciences in Australia (early-career, biennial)
2015 – 2019 Career Development Fellowship (CDF1) from the National Health and Medical Council Research (NHMRC) ‘Development of statistical methodologies and application to clinical cancer studies’, $419K.
2009 Laurent-Duhamel triennial prize from the French Statistical Society for PhD thesis in Applied Statistics, Bordeaux, France.
Current funding (UoM)
2018 – 2019 Silicon Valley Community Foundation, HCA2-A-1708-02277, Multivariate computational methods for data integration of single cell assays. Role: CIA. $132K
2018 UoM Computational Biology Research Initiative seed funding. Towards the understanding of gut-brain crosstalk in Huntington’s disease. Role: CIA. $20K.
2018 – 2021 NHMRC Project Grant, GNT1142456. Enhancing host defence mechanisms in severe bacterial infections. Dr A Blumenthal, Prof B Venkatesh, Prof D Evans, Dr K-A Lê Cao, Prof G Ulett, A/Prof J Cohen. Role: CID. $837K
2018 – 2021 NHMRC Project Grant. GNT1144941. Understanding how azithromycin prevents exacerbations in severe asthma. Prof J Upham, Prof J Simpson, Dr K Baines, Dr K- A Lê Cao. Role: CID. $698K
2018 – 2019 ARC Special Research Initiative in Stem cells Centre of Excellence, Stem Cells Australia led by Prof M Little (UoM). Role: co-CI. $3M, 1 research fellow in Lê Cao group.
The application of our methods and software has directly resulted in four biomedical patents.
- Gandhi M, Keane C, Lê Cao K-A, Vari F (2015). A method of assessing prognosis of lymphoma. WO/2016/134416. Priority 23/02/2016
- Thomas R, Mehdi A, Lê Cao K-A (2014). Kits and methods for the diagnosis, treatment, prevention and monitoring of diabetes. PCT/AU2014/050415. Priority 18/06/2015
- Hill M, Shah A, Lê Cao K-A (2014). Blood Test for Throat Cancer. WO/2016/077881. Priority 17/11/2015
- Musso O, Desert R, Rohart F, Lê Cao K-A. Method for predicting the survival time of a patient suffering from hepatocellular Carcinoma. EP17305436.2. Priority 12/04/2017
Past funding (University of Queensland)
2016 Translational Research Institute SPORE grant, Obesity-induced Barrett’s oesophagus and associated cancer: mechanisms and diagnostic tools. A/Prof M. Hill, Dr A. Barbour, Dr K-A. Lê Cao (CIC). $100K
2016 Translational Research Institute SPORE grant, Towards biomarkers for patient stratification in sepsis, Dr A. Blumenthal, Prof B. Venkatesh, A/Prof J Cohen, Dr K-A. Lê Cao, Dr D. Vagenas, Prof I. Frazer (CID). $80K
2014 – 2015 The Juvenile Diabetes Research Foundation (JDRF), 2-SRA-2015-306-Q-R, A genetic link between gut microbial flora and T1D susceptibility. Dr D. Zipris (University of Colorado) and co-CI from UQDI: Dr E. Hamilton-Williams, Dr J. Mullaney, A/Prof M. Hill, Dr K-A. Lê Cao (PI). $500K
2014 The Juvenile Diabetes Research Foundation (JDRF), 1-PNF-2014-153-A-V, Risk of diabetes progression in at-risk subjects with metabolic and inflammatory signatures. Prof R. Thomas (UQDI), K-A. Lê Cao et al. (PI). $110K
2014 UQ Major Equipment and Infrastructure, 2014000102, High throughput gene expression of patient samples via the Nanostring nCounter system. Prof M. Gandhi and 9 co-CI from UQDI, K-A. Lê Cao (CIJ). $169K
2014 – 2016 NHMRC Project Grants Funding, APP1058993, Blood biomarkers in Hodgkin Lymphoma. Prof M. Gandhi, Prof M. Fulham, A/Prof J. Trotman, Dr K-A. Lê Cao, Dr L. Berkahn. (CID). $513K
2013 – 2015 ARC Discovery Project, DP130100777. The Stemformatics gene expression compendium: development of multivariate statistical approaches for cross platform analyses. A/Prof C. Wells, Dr K-A. Lê Cao (CIB). $269K, shared postdoctoral fellow.
Our lab aims to inspire younger generations of budding statisticians, data analysts and computational biologists to advance the field of computational biostatistics.
All our members use GitHub and thrive for reproducible research, see:
We are preparing a handbook about multivariate projection-based methods and how to apply them (using mixOmics!) to integrate biological data. Stay tuned!
2019 – We have developed a 16-week online course opened for University of Melbourne students called ‘Data fundamentals’ with Dr Sue Finch (Statistical Consulting Centre, School of Mathematics and Statistics). The course is opened every trimester. Have a look at this page if you wish to register, it is a fun course to learn how to work with data.
Since 2014 – We teach specialised workshops to introduce key concepts in multivariate statistics, with applications using the R software mixOmics. Our mixOmics web page provides numerous tutorials to apply the different multivariate integrative methods implemented in mixOmics.
We taught introductory statistics ‘Statistics for frightened bioresearchers’ lecture materials can be found here.
Below is a list of opportunities in our lab, including undergraduate and postgraduate research projects and scientific visits.
We are looking for self-motivated candidates in the field of computational statistics applied to high-throughput biological data, as data analyst and software developer. We are also looking for a long term (2 + 2years) senior postdoc with opportunities for teaching as well as statistical methods development. Contact us!
We welcome undergraduate, hons/Msc and PhD students willing to be part of the group to apply our methods to specific biological problems, or develop innovative computational methods at the forefront of ‘omics and microbiome data integration. There are plenty of projects to choose from our research themes and cross-discipline projects. Some are listed in here.
We welcome wet-lab researchers and assist them in acquiring the necessary skillsets to analyse their own data with our tools, and dry-lab researchers to collaborate on our many exciting projects.
Dr Sébastien Déjean stayed for 5 weeks with us in July 2018 and helped run a mixOmics workshop.
Prof Malu Calle Rosingana visited us for 4 weeks in January 2019.
Stijn Hawinkel, PhD candidate in Prof Olivier Thas (Ghent University) visited us for 3 months (March – May 2019).
Dr Olivier Chapleur and Ms Laetitia Cardona were back for 4 and 6 weeks from April 2019! they gave us a hand for our upcoming mixOmics workshop focusing on microbiome data analysis.
Mr Attila Csala, PhD candidate in in Prof Aeilko Zwinderman (University of Amsterdam) will visit us for 4 months (Nov 2019 – March 2020)
@: kimanh.lecao[ at ]unimelb.edu.au
Ph: +61 3 8344 3971
We are located at:
Melbourne Integrative Genomics | Old microbiology building 184 ground floor | The University of Melbourne
Entrance is through Royal Parade, approximately at 30 Royal Parade (see map), next to the Kenneth Meyer building (Tram Route 19, Stop no. 11 from the city centre).
There is a phone in the reception area, with the contact numbers. Give us a buzz then (and if we don’t answer, call Andrew or Bobbie).