Global picoplankton biogeography revealed by metagenomic and climatic data integration

Vini Salazar is reaching the end of his PhD candidature (and has already secured a continuing position at Melbourne Bioinformatics, UoM 🎉). His preprint is one of this major contributions of his PhD thesis (he also published Metaphor, to assemble metagenomes)!

  • We compiled and bioinformatically processed 1454 metagenomes from multiple sampling consortia, resulting in the largest integrated surface ocean metagenome analysis to date
  • We defined 10 biogeographical provinces based on metagenomics data
  • We used machine learning and omics data integration techniques to characterise the environmental, taxonomical, and functional features of these provinces

Global picoplankton biogeography revealed by metagenomic and climatic data integration. Vinícius W. SalazarHeroen VerbruggenVanessa Rossetto MarcelinoKim-Anh Lê Cao.

Abstract
Microbial plankton play fundamental roles in biogeochemical cycles, driving nutrient cycling that influences the global climate and supports life on Earth. Picoplankton are the smallest and most abundant planktonic organisms. The distribution and ecology of these organisms is determined by environmental factors and their biogeography is largely shaped by basin-scale patterns of physicochemical composition of ocean waters. The increased availability of high-throughput sequencing data of microbial communities has enabled the description of how the global oceans are partitioned into distinct microbial biogeographical provinces. However, the key attributes associated with such provinces are still unclear. Here we present a model of picoplankton biogeography based on 1454 metagenomes from multiple sampling consortia, resulting in the largest integrated surface ocean metagenome analysis to date. We identify ten distinct groups based on metagenomic dissimilarity, divided into three categories: polar (Arctic and Antarctic), temperate (coastal temperate, temperate/subtropical transition, oceanic temperate, Mediterranean-like) and tropical (tropical low nutrient, tropical high nutrient, subtropical oceanic gyres). Using machine learning and omics data integration techniques, we predict province areas across the surface oceans and describe their environmental, taxonomical, and functional features. We quantify the relationship between environmental factors and each biogeographical province, identify their main representative taxa and the importance of carbon degradation and antimicrobial resistance pathways in functional community composition, and discuss implications for establishing a model for global picoplankton biogeography.