Our lab focuses on the development of computational methods, their applications in areas informed by biology, and the training of the new generation of computational biologists and data analysts. Our area of expertise is in the integration of biological ‘omics data (transcriptomics, proteomics, metabolomics etc., as well as microbiome and metagenomics) with multivariate and dimension reduction methodologies, selection of features of biomarkers in large biological data sets and R software development. Our group provides critical collaborative expertise to biologists, bioinformaticians, statisticians and clinicians and welcomes budding data analysts.
Our aim is to broadly enable scientific progress well beyond statistical development itself. We value creative thinking in statistical methodological development to address critical challenges arising from high throughput biological research.
More news about (workshops and updates): www.mixOmics.org
Lab head: Dr Kim-Anh Lê Cao
Snr Lecturer, NHMRC Career Development Fellow
Melbourne Integrative Genomics (MIG) & School of Mathematics and Statistics
Building 184 ground floor | University of Melbourne | Parkville VIC 3010
@: kimanh.lecao[ at ]unimelb.edu.au | Ph: +61 3 8344 3971
Our lab specialises computational methods and software developments, as well as the application of our methods and tools to biological data sets generated by our collaborators.
Development of data integration methods using multivariate projection-based methodologies
Our dimension reduction methods are based on the Projection to Latent Structures algorithm (PLS, a term we prefer to Partial Least Squares regression, Wold et al. 2001) that are combined with LASSO regularization to identify important biological features or biomarkers in large-scale biological data sets. Our latest frameworks include DIABLO (Singh et al. in prep) to integration multiple data sets measured on the same N samples (N-integration); MINT (Rohart et al. 2017a) to integrate independent studies measured on the same P variables / genes (P-integration) and mixMC (Lê Cao et al. 2016) for the multivariate analysis of microbial communities.
Specifically, we are currently developing novel multivariate methodologies to
- integrate multiple ‘omics time course data
- integrate genotype (SNP) data
Development of the mixOmics R toolkit package (www.mixOmics.org)
mixOmics is one of the few R package dedicated to the integration of multiple ‘omics data (19 novel methodologies implemented so far, amongst which 13 were developed by our lab) and with an increasing uptake from the research community. The package has been downloaded > 29K times in 2017, (R CRAN package download logs). Programming developments are on-going for interactive web interfaces, and efficient programming for large-scale studies. Check our our recent publication (Rohart et al. 2017b) and a poster that gives an overview of this large project. The mixOmics team run multiple day workshops for an introduction to multivariate projection-based methods for data integration using mixOmics, see our website www.mixOmics.org for news and tutorials.
Development and application of multivariate methods for microbiome studies
There are major statistical and computational challenges in analysing microbial communities that currently hinder the potential of microbiome research to substantially advance biomedical understanding. We are currently expanding mixMC to better characterise and understand important microbiome-host interactions. Some of our methods developments aim at addressing batch effects in microbiome experiments and analyse scarce temporal sampling in time course studies.
We analyse microbiome datasets from our collaborators for a wide range of studies, including investigating the role of gut and oral microbiome in spondyloarthropathy diseases, the development of intestinal or salivary microbiota in toddlers and infants, investigating the gut-brain crosstalk in Huntington’s disease.
- [insert your name here!] – computational genomics and modelling, 1-year, research fellow, applications closed.
- Al J Abadi- postdoctoral fellow, software developer and computational statistics
- Dr Zitong Li – postdoctoral fellow, statistical genomics.
- Dr Malathi Imiyage Dona – postdoctoral fellow, computational biology, in collaboration with the Centre for Stem Cell Systems led by Prof Christine Wells
Higher Degree Research students
- Isaac Virshup, PhD candidate ‘Finding patterns of biologically meaningful transcript expression by examining heterogenous sets of cells’, UoM with main supervisor Prof Christine Wells
- Eva Yiwen Wang, PhD candidate ‘Development of multivariate and integrative statistical methods to improve microbiome research outputs‘, UoM
- Aimee Hanson, PhD candidate ‘Lymphocyte receptors: Genomic structure and role in immune- mediated arthritis’ with main supervisor Prof Matt Brown (QUT) and Diamantina Institute, Faculty of Medicine, University of Queensland
- Farah Syeda Zahir, PhD candidate ‘Obesity paradox: Exploring the relationship between adiposity and mortality in persons with Cardiovascular Disease and/or Type 2 Diabetes Mellitus’, co-supervised with Dr Ahmed Medi (Diamantina Institute), School of public Health, University of Queensland.
- Camilla Fisher, Master of Science (Bioinformatics), UoM
- Alana Butler, Master of Science (Bioinformatics), UoM
Visiting members who participate in our lab meeting (anyone welcome)
- Dr Amy Loughman, Research Fellow, Deakin University
- Dr Alexandra Roth-Schulze, Postdoctoral Fellow, Papenfuss lab, WEHI
- Ms Geraldine Kong, PhD student, Hannan Lab (Florey institute)
- Florian Rohart – postdoctoral fellow, applied statistician, University of Queensland
- Nicholas Matigian – data analyst, University of Queensland
- Benoit Gautier – statistician, University of Queensland
Alumni students (PhD)
- Jasmin Straube ‘Development of statistical tools for integrating time course ‘omics’ data’ with co-supervisors Dr Emma Huang and Dr Anne Bernard, QFAB and University of Queensland.
- Ralph Patrick ‘Molecular interaction motifs in a system-wide network context: Computationally charting transient kinase-substrate phosphorylation events’ with main supervisor A/Prof Mikael Boden, University of Queensland.
- Amrit Singh ‘Blood biomarker panels of the late phase asthmatic response’ with main supervisor Prof Scott Tebbutt, University of British Columbia, Vancouver, Canada.
- Chao Liu ‘Computational analysis of DNA repair pathways in breast cancer’ with main supervisor Prof Mark Ragan, Institute for Molecular Bioscience, University of Queensland
Alumni students (Honours and Msc)
- Nicholas d’Arcy, Nicholas Mueller, University of Queensland
- Solange Pruilh, Zoe Welham, Vanessa Lakis, Priscilla Montfalet, Thom Cuddihy, Mourad Larbi, Jeff Coquery, Pierre Monget
(refereed by editorial board)
Huang BE, Clifford D and Lê Cao K-A (2014). The surprising benefit of passive-aggressive behaviour at Christmas parties: being crowned king of the crackers. Medical Journal of Australia 201(11):694-6 (Christmas issue, awarded first prize, radio interview from ABC Darwin, mentioned in the podcast from Two Shrink Pod (episode 21, Dec 2017).
Clifford D, Lê Cao K-A and Huang BE (2014). The statistician’s guide to a cracking good Christmas party. Significance 11(5):44-7 (Christmas issue, doi: 10.1111/j.1740- 9713.2014.00784.x).
The application of our methods and software has directly resulted in four biomedical patents.
- Gandhi M, Keane C, Lê Cao K-A, Vari F (2015). A method of assessing prognosis of lymphoma. WO/2016/134416. Priority 23/02/2016
- Thomas R, Mehdi A, Lê Cao K-A (2014). Kits and methods for the diagnosis, treatment, prevention and monitoring of diabetes. PCT/AU2014/050415. Priority 18/06/2015
- Hill M, Shah A, Lê Cao K-A (2014). Blood Test for Throat Cancer. WO/2016/077881. Priority 17/11/2015
- Musso O, Desert R, Rohart F, Lê Cao K-A. Method for predicting the survival time of a patient suffering from hepatocellular Carcinoma. EP17305436.2. Priority 12/04/2017
Our lab aims to inspire younger generations of budding statisticians, data analysts and computational biologists to advance the field of computational biostatistics.
We are currently preparing a MOOC (Massive Open Online Courses) on ‘Data fundamentals’.
We teach specialised workshops to introduce key concepts in multivariate statistics, with applications using the R software mixOmics. Our mixOmics web page provides numerous tutorials to apply the different multivariate integrative methods implemented in mixOmics.
We taught introductory statistics ‘Statistics for frightened bioresearchers’ lecture materials can be found here.
Below is a list of opportunities in our lab, including undergraduate and postgraduate research projects and scientific visits.
Research fellow positions
The application for the following research position is now closed:
- Genomics data analysis (1 year, start July 2018), Level A, collaboration between University of Queensland & Melbourne
We welcome undergraduate, hons/Msc and PhD students willing to be part of the group to apply our methods to specific biological problems, or develop innovative computational methods at the forefront of ‘omics and microbiome data integration. There are plenty of projects to choose from our research themes and cross-discipline projects. Some are listed in here.
We welcome wet-lab researchers and assist them in acquiring the necessary skillsets to analyse their own data with our tools, and dry-lab researchers to collaborate on our many exciting projects.
@: kimanh.lecao[ at ]unimelb.edu.au
Ph: +61 3 8344 3971
We are located at:
Melbourne Integrative Genomics | Old microbiology building 184 | The University of Melbourne
Entrance through Royal Parade, approximately at 30 Royal Parade (see map), between the Kenneth Meyer and Doherty buildings.
There is a phone in the reception area to give us a buzz.