Cargando…
Unifying the known and unknown microbial coding sequence space
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we pr...
Autores principales: | , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
eLife Sciences Publications, Ltd
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9132574/ https://www.ncbi.nlm.nih.gov/pubmed/35356891 http://dx.doi.org/10.7554/eLife.67667 |
_version_ | 1784713409471709184 |
---|---|
author | Vanni, Chiara Schechter, Matthew S Acinas, Silvia G Barberán, Albert Buttigieg, Pier Luigi Casamayor, Emilio O Delmont, Tom O Duarte, Carlos M Eren, A Murat Finn, Robert D Kottmann, Renzo Mitchell, Alex Sánchez, Pablo Siren, Kimmo Steinegger, Martin Gloeckner, Frank Oliver Fernàndez-Guerra, Antonio |
author_facet | Vanni, Chiara Schechter, Matthew S Acinas, Silvia G Barberán, Albert Buttigieg, Pier Luigi Casamayor, Emilio O Delmont, Tom O Duarte, Carlos M Eren, A Murat Finn, Robert D Kottmann, Renzo Mitchell, Alex Sánchez, Pablo Siren, Kimmo Steinegger, Martin Gloeckner, Frank Oliver Fernàndez-Guerra, Antonio |
author_sort | Vanni, Chiara |
collection | PubMed |
description | Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data. |
format | Online Article Text |
id | pubmed-9132574 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | eLife Sciences Publications, Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-91325742022-05-26 Unifying the known and unknown microbial coding sequence space Vanni, Chiara Schechter, Matthew S Acinas, Silvia G Barberán, Albert Buttigieg, Pier Luigi Casamayor, Emilio O Delmont, Tom O Duarte, Carlos M Eren, A Murat Finn, Robert D Kottmann, Renzo Mitchell, Alex Sánchez, Pablo Siren, Kimmo Steinegger, Martin Gloeckner, Frank Oliver Fernàndez-Guerra, Antonio eLife Computational and Systems Biology Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data. eLife Sciences Publications, Ltd 2022-03-31 /pmc/articles/PMC9132574/ /pubmed/35356891 http://dx.doi.org/10.7554/eLife.67667 Text en © 2022, Vanni et al https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited. |
spellingShingle | Computational and Systems Biology Vanni, Chiara Schechter, Matthew S Acinas, Silvia G Barberán, Albert Buttigieg, Pier Luigi Casamayor, Emilio O Delmont, Tom O Duarte, Carlos M Eren, A Murat Finn, Robert D Kottmann, Renzo Mitchell, Alex Sánchez, Pablo Siren, Kimmo Steinegger, Martin Gloeckner, Frank Oliver Fernàndez-Guerra, Antonio Unifying the known and unknown microbial coding sequence space |
title | Unifying the known and unknown microbial coding sequence space |
title_full | Unifying the known and unknown microbial coding sequence space |
title_fullStr | Unifying the known and unknown microbial coding sequence space |
title_full_unstemmed | Unifying the known and unknown microbial coding sequence space |
title_short | Unifying the known and unknown microbial coding sequence space |
title_sort | unifying the known and unknown microbial coding sequence space |
topic | Computational and Systems Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9132574/ https://www.ncbi.nlm.nih.gov/pubmed/35356891 http://dx.doi.org/10.7554/eLife.67667 |
work_keys_str_mv | AT vannichiara unifyingtheknownandunknownmicrobialcodingsequencespace AT schechtermatthews unifyingtheknownandunknownmicrobialcodingsequencespace AT acinassilviag unifyingtheknownandunknownmicrobialcodingsequencespace AT barberanalbert unifyingtheknownandunknownmicrobialcodingsequencespace AT buttigiegpierluigi unifyingtheknownandunknownmicrobialcodingsequencespace AT casamayoremilioo unifyingtheknownandunknownmicrobialcodingsequencespace AT delmonttomo unifyingtheknownandunknownmicrobialcodingsequencespace AT duartecarlosm unifyingtheknownandunknownmicrobialcodingsequencespace AT erenamurat unifyingtheknownandunknownmicrobialcodingsequencespace AT finnrobertd unifyingtheknownandunknownmicrobialcodingsequencespace AT kottmannrenzo unifyingtheknownandunknownmicrobialcodingsequencespace AT mitchellalex unifyingtheknownandunknownmicrobialcodingsequencespace AT sanchezpablo unifyingtheknownandunknownmicrobialcodingsequencespace AT sirenkimmo unifyingtheknownandunknownmicrobialcodingsequencespace AT steineggermartin unifyingtheknownandunknownmicrobialcodingsequencespace AT gloecknerfrankoliver unifyingtheknownandunknownmicrobialcodingsequencespace AT fernandezguerraantonio unifyingtheknownandunknownmicrobialcodingsequencespace |