Cargando…
Unraveling the functional dark matter through global metagenomics
Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities(1,2). Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine t...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584684/ https://www.ncbi.nlm.nih.gov/pubmed/37821698 http://dx.doi.org/10.1038/s41586-023-06583-7 |
_version_ | 1785122792995291136 |
---|---|
author | Pavlopoulos, Georgios A. Baltoumas, Fotis A. Liu, Sirui Selvitopi, Oguz Camargo, Antonio Pedro Nayfach, Stephen Azad, Ariful Roux, Simon Call, Lee Ivanova, Natalia N. Chen, I. Min Paez-Espino, David Karatzas, Evangelos Iliopoulos, Ioannis Konstantinidis, Konstantinos Tiedje, James M. Pett-Ridge, Jennifer Baker, David Visel, Axel Ouzounis, Christos A. Ovchinnikov, Sergey Buluç, Aydin Kyrpides, Nikos C. |
author_facet | Pavlopoulos, Georgios A. Baltoumas, Fotis A. Liu, Sirui Selvitopi, Oguz Camargo, Antonio Pedro Nayfach, Stephen Azad, Ariful Roux, Simon Call, Lee Ivanova, Natalia N. Chen, I. Min Paez-Espino, David Karatzas, Evangelos Iliopoulos, Ioannis Konstantinidis, Konstantinos Tiedje, James M. Pett-Ridge, Jennifer Baker, David Visel, Axel Ouzounis, Christos A. Ovchinnikov, Sergey Buluç, Aydin Kyrpides, Nikos C. |
author_sort | Pavlopoulos, Georgios A. |
collection | PubMed |
description | Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities(1,2). Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database(3). Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter. |
format | Online Article Text |
id | pubmed-10584684 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-105846842023-10-20 Unraveling the functional dark matter through global metagenomics Pavlopoulos, Georgios A. Baltoumas, Fotis A. Liu, Sirui Selvitopi, Oguz Camargo, Antonio Pedro Nayfach, Stephen Azad, Ariful Roux, Simon Call, Lee Ivanova, Natalia N. Chen, I. Min Paez-Espino, David Karatzas, Evangelos Iliopoulos, Ioannis Konstantinidis, Konstantinos Tiedje, James M. Pett-Ridge, Jennifer Baker, David Visel, Axel Ouzounis, Christos A. Ovchinnikov, Sergey Buluç, Aydin Kyrpides, Nikos C. Nature Article Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities(1,2). Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database(3). Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter. Nature Publishing Group UK 2023-10-11 2023 /pmc/articles/PMC10584684/ /pubmed/37821698 http://dx.doi.org/10.1038/s41586-023-06583-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Pavlopoulos, Georgios A. Baltoumas, Fotis A. Liu, Sirui Selvitopi, Oguz Camargo, Antonio Pedro Nayfach, Stephen Azad, Ariful Roux, Simon Call, Lee Ivanova, Natalia N. Chen, I. Min Paez-Espino, David Karatzas, Evangelos Iliopoulos, Ioannis Konstantinidis, Konstantinos Tiedje, James M. Pett-Ridge, Jennifer Baker, David Visel, Axel Ouzounis, Christos A. Ovchinnikov, Sergey Buluç, Aydin Kyrpides, Nikos C. Unraveling the functional dark matter through global metagenomics |
title | Unraveling the functional dark matter through global metagenomics |
title_full | Unraveling the functional dark matter through global metagenomics |
title_fullStr | Unraveling the functional dark matter through global metagenomics |
title_full_unstemmed | Unraveling the functional dark matter through global metagenomics |
title_short | Unraveling the functional dark matter through global metagenomics |
title_sort | unraveling the functional dark matter through global metagenomics |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10584684/ https://www.ncbi.nlm.nih.gov/pubmed/37821698 http://dx.doi.org/10.1038/s41586-023-06583-7 |
work_keys_str_mv | AT pavlopoulosgeorgiosa unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT baltoumasfotisa unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT liusirui unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT selvitopioguz unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT camargoantoniopedro unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT nayfachstephen unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT azadariful unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT rouxsimon unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT calllee unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT ivanovanatalian unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT chenimin unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT paezespinodavid unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT karatzasevangelos unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT iliopoulosioannis unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT konstantinidiskonstantinos unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT tiedjejamesm unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT pettridgejennifer unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT bakerdavid unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT viselaxel unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT ouzounischristosa unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT ovchinnikovsergey unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT bulucaydin unravelingthefunctionaldarkmatterthroughglobalmetagenomics AT kyrpidesnikosc unravelingthefunctionaldarkmatterthroughglobalmetagenomics |