Cargando…

Unraveling the functional dark matter through global metagenomics

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities(1,2). Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine t...

Descripción completa

Detalles Bibliográficos
Autores principales: Pavlopoulos, Georgios A., Baltoumas, Fotis A., Liu, Sirui, Selvitopi, Oguz, Camargo, Antonio Pedro, Nayfach, Stephen, Azad, Ariful, Roux, Simon, Call, Lee, Ivanova, Natalia N., Chen, I. Min, Paez-Espino, David, Karatzas, Evangelos, Iliopoulos, Ioannis, Konstantinidis, Konstantinos, Tiedje, James M., Pett-Ridge, Jennifer, Baker, David, Visel, Axel, Ouzounis, Christos A., Ovchinnikov, Sergey, Buluç, Aydin, Kyrpides, Nikos C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584684/
https://www.ncbi.nlm.nih.gov/pubmed/37821698
http://dx.doi.org/10.1038/s41586-023-06583-7
_version_ 1785122792995291136
author Pavlopoulos, Georgios A.
Baltoumas, Fotis A.
Liu, Sirui
Selvitopi, Oguz
Camargo, Antonio Pedro
Nayfach, Stephen
Azad, Ariful
Roux, Simon
Call, Lee
Ivanova, Natalia N.
Chen, I. Min
Paez-Espino, David
Karatzas, Evangelos
Iliopoulos, Ioannis
Konstantinidis, Konstantinos
Tiedje, James M.
Pett-Ridge, Jennifer
Baker, David
Visel, Axel
Ouzounis, Christos A.
Ovchinnikov, Sergey
Buluç, Aydin
Kyrpides, Nikos C.
author_facet Pavlopoulos, Georgios A.
Baltoumas, Fotis A.
Liu, Sirui
Selvitopi, Oguz
Camargo, Antonio Pedro
Nayfach, Stephen
Azad, Ariful
Roux, Simon
Call, Lee
Ivanova, Natalia N.
Chen, I. Min
Paez-Espino, David
Karatzas, Evangelos
Iliopoulos, Ioannis
Konstantinidis, Konstantinos
Tiedje, James M.
Pett-Ridge, Jennifer
Baker, David
Visel, Axel
Ouzounis, Christos A.
Ovchinnikov, Sergey
Buluç, Aydin
Kyrpides, Nikos C.
author_sort Pavlopoulos, Georgios A.
collection PubMed
description Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities(1,2). Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database(3). Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.
format Online
Article
Text
id pubmed-10584684
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-105846842023-10-20 Unraveling the functional dark matter through global metagenomics Pavlopoulos, Georgios A. Baltoumas, Fotis A. Liu, Sirui Selvitopi, Oguz Camargo, Antonio Pedro Nayfach, Stephen Azad, Ariful Roux, Simon Call, Lee Ivanova, Natalia N. Chen, I. Min Paez-Espino, David Karatzas, Evangelos Iliopoulos, Ioannis Konstantinidis, Konstantinos Tiedje, James M. Pett-Ridge, Jennifer Baker, David Visel, Axel Ouzounis, Christos A. Ovchinnikov, Sergey Buluç, Aydin Kyrpides, Nikos C. Nature Article Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities(1,2). Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database(3). Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter. Nature Publishing Group UK 2023-10-11 2023 /pmc/articles/PMC10584684/ /pubmed/37821698 http://dx.doi.org/10.1038/s41586-023-06583-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Pavlopoulos, Georgios A.
Baltoumas, Fotis A.
Liu, Sirui
Selvitopi, Oguz
Camargo, Antonio Pedro
Nayfach, Stephen
Azad, Ariful
Roux, Simon
Call, Lee
Ivanova, Natalia N.
Chen, I. Min
Paez-Espino, David
Karatzas, Evangelos
Iliopoulos, Ioannis
Konstantinidis, Konstantinos
Tiedje, James M.
Pett-Ridge, Jennifer
Baker, David
Visel, Axel
Ouzounis, Christos A.
Ovchinnikov, Sergey
Buluç, Aydin
Kyrpides, Nikos C.
Unraveling the functional dark matter through global metagenomics
title Unraveling the functional dark matter through global metagenomics
title_full Unraveling the functional dark matter through global metagenomics
title_fullStr Unraveling the functional dark matter through global metagenomics
title_full_unstemmed Unraveling the functional dark matter through global metagenomics
title_short Unraveling the functional dark matter through global metagenomics
title_sort unraveling the functional dark matter through global metagenomics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584684/
https://www.ncbi.nlm.nih.gov/pubmed/37821698
http://dx.doi.org/10.1038/s41586-023-06583-7
work_keys_str_mv AT pavlopoulosgeorgiosa unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT baltoumasfotisa unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT liusirui unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT selvitopioguz unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT camargoantoniopedro unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT nayfachstephen unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT azadariful unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT rouxsimon unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT calllee unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT ivanovanatalian unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT chenimin unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT paezespinodavid unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT karatzasevangelos unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT iliopoulosioannis unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT konstantinidiskonstantinos unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT tiedjejamesm unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT pettridgejennifer unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT bakerdavid unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT viselaxel unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT ouzounischristosa unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT ovchinnikovsergey unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT bulucaydin unravelingthefunctionaldarkmatterthroughglobalmetagenomics
AT kyrpidesnikosc unravelingthefunctionaldarkmatterthroughglobalmetagenomics