Cargando…

Unifying the known and unknown microbial coding sequence space

Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Vanni, Chiara, Schechter, Matthew S, Acinas, Silvia G, Barberán, Albert, Buttigieg, Pier Luigi, Casamayor, Emilio O, Delmont, Tom O, Duarte, Carlos M, Eren, A Murat, Finn, Robert D, Kottmann, Renzo, Mitchell, Alex, Sánchez, Pablo, Siren, Kimmo, Steinegger, Martin, Gloeckner, Frank Oliver, Fernàndez-Guerra, Antonio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9132574/
https://www.ncbi.nlm.nih.gov/pubmed/35356891
http://dx.doi.org/10.7554/eLife.67667
_version_ 1784713409471709184
author Vanni, Chiara
Schechter, Matthew S
Acinas, Silvia G
Barberán, Albert
Buttigieg, Pier Luigi
Casamayor, Emilio O
Delmont, Tom O
Duarte, Carlos M
Eren, A Murat
Finn, Robert D
Kottmann, Renzo
Mitchell, Alex
Sánchez, Pablo
Siren, Kimmo
Steinegger, Martin
Gloeckner, Frank Oliver
Fernàndez-Guerra, Antonio
author_facet Vanni, Chiara
Schechter, Matthew S
Acinas, Silvia G
Barberán, Albert
Buttigieg, Pier Luigi
Casamayor, Emilio O
Delmont, Tom O
Duarte, Carlos M
Eren, A Murat
Finn, Robert D
Kottmann, Renzo
Mitchell, Alex
Sánchez, Pablo
Siren, Kimmo
Steinegger, Martin
Gloeckner, Frank Oliver
Fernàndez-Guerra, Antonio
author_sort Vanni, Chiara
collection PubMed
description Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.
format Online
Article
Text
id pubmed-9132574
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-91325742022-05-26 Unifying the known and unknown microbial coding sequence space Vanni, Chiara Schechter, Matthew S Acinas, Silvia G Barberán, Albert Buttigieg, Pier Luigi Casamayor, Emilio O Delmont, Tom O Duarte, Carlos M Eren, A Murat Finn, Robert D Kottmann, Renzo Mitchell, Alex Sánchez, Pablo Siren, Kimmo Steinegger, Martin Gloeckner, Frank Oliver Fernàndez-Guerra, Antonio eLife Computational and Systems Biology Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data. eLife Sciences Publications, Ltd 2022-03-31 /pmc/articles/PMC9132574/ /pubmed/35356891 http://dx.doi.org/10.7554/eLife.67667 Text en © 2022, Vanni et al https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Computational and Systems Biology
Vanni, Chiara
Schechter, Matthew S
Acinas, Silvia G
Barberán, Albert
Buttigieg, Pier Luigi
Casamayor, Emilio O
Delmont, Tom O
Duarte, Carlos M
Eren, A Murat
Finn, Robert D
Kottmann, Renzo
Mitchell, Alex
Sánchez, Pablo
Siren, Kimmo
Steinegger, Martin
Gloeckner, Frank Oliver
Fernàndez-Guerra, Antonio
Unifying the known and unknown microbial coding sequence space
title Unifying the known and unknown microbial coding sequence space
title_full Unifying the known and unknown microbial coding sequence space
title_fullStr Unifying the known and unknown microbial coding sequence space
title_full_unstemmed Unifying the known and unknown microbial coding sequence space
title_short Unifying the known and unknown microbial coding sequence space
title_sort unifying the known and unknown microbial coding sequence space
topic Computational and Systems Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9132574/
https://www.ncbi.nlm.nih.gov/pubmed/35356891
http://dx.doi.org/10.7554/eLife.67667
work_keys_str_mv AT vannichiara unifyingtheknownandunknownmicrobialcodingsequencespace
AT schechtermatthews unifyingtheknownandunknownmicrobialcodingsequencespace
AT acinassilviag unifyingtheknownandunknownmicrobialcodingsequencespace
AT barberanalbert unifyingtheknownandunknownmicrobialcodingsequencespace
AT buttigiegpierluigi unifyingtheknownandunknownmicrobialcodingsequencespace
AT casamayoremilioo unifyingtheknownandunknownmicrobialcodingsequencespace
AT delmonttomo unifyingtheknownandunknownmicrobialcodingsequencespace
AT duartecarlosm unifyingtheknownandunknownmicrobialcodingsequencespace
AT erenamurat unifyingtheknownandunknownmicrobialcodingsequencespace
AT finnrobertd unifyingtheknownandunknownmicrobialcodingsequencespace
AT kottmannrenzo unifyingtheknownandunknownmicrobialcodingsequencespace
AT mitchellalex unifyingtheknownandunknownmicrobialcodingsequencespace
AT sanchezpablo unifyingtheknownandunknownmicrobialcodingsequencespace
AT sirenkimmo unifyingtheknownandunknownmicrobialcodingsequencespace
AT steineggermartin unifyingtheknownandunknownmicrobialcodingsequencespace
AT gloecknerfrankoliver unifyingtheknownandunknownmicrobialcodingsequencespace
AT fernandezguerraantonio unifyingtheknownandunknownmicrobialcodingsequencespace