Cargando…

A most wanted list of conserved microbial protein families with no known domains

The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains,...

Descripción completa

Detalles Bibliográficos
Autores principales: Wyman, Stacia K., Avila-Herrera, Aram, Nayfach, Stephen, Pollard, Katherine S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6192648/
https://www.ncbi.nlm.nih.gov/pubmed/30332487
http://dx.doi.org/10.1371/journal.pone.0205749
_version_ 1783363938493136896
author Wyman, Stacia K.
Avila-Herrera, Aram
Nayfach, Stephen
Pollard, Katherine S.
author_facet Wyman, Stacia K.
Avila-Herrera, Aram
Nayfach, Stephen
Pollard, Katherine S.
author_sort Wyman, Stacia K.
collection PubMed
description The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a “most wanted” list of genes to prioritize for further characterization.
format Online
Article
Text
id pubmed-6192648
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61926482018-11-05 A most wanted list of conserved microbial protein families with no known domains Wyman, Stacia K. Avila-Herrera, Aram Nayfach, Stephen Pollard, Katherine S. PLoS One Research Article The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a “most wanted” list of genes to prioritize for further characterization. Public Library of Science 2018-10-17 /pmc/articles/PMC6192648/ /pubmed/30332487 http://dx.doi.org/10.1371/journal.pone.0205749 Text en © 2018 Wyman et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wyman, Stacia K.
Avila-Herrera, Aram
Nayfach, Stephen
Pollard, Katherine S.
A most wanted list of conserved microbial protein families with no known domains
title A most wanted list of conserved microbial protein families with no known domains
title_full A most wanted list of conserved microbial protein families with no known domains
title_fullStr A most wanted list of conserved microbial protein families with no known domains
title_full_unstemmed A most wanted list of conserved microbial protein families with no known domains
title_short A most wanted list of conserved microbial protein families with no known domains
title_sort most wanted list of conserved microbial protein families with no known domains
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6192648/
https://www.ncbi.nlm.nih.gov/pubmed/30332487
http://dx.doi.org/10.1371/journal.pone.0205749
work_keys_str_mv AT wymanstaciak amostwantedlistofconservedmicrobialproteinfamilieswithnoknowndomains
AT avilaherreraaram amostwantedlistofconservedmicrobialproteinfamilieswithnoknowndomains
AT nayfachstephen amostwantedlistofconservedmicrobialproteinfamilieswithnoknowndomains
AT pollardkatherines amostwantedlistofconservedmicrobialproteinfamilieswithnoknowndomains
AT wymanstaciak mostwantedlistofconservedmicrobialproteinfamilieswithnoknowndomains
AT avilaherreraaram mostwantedlistofconservedmicrobialproteinfamilieswithnoknowndomains
AT nayfachstephen mostwantedlistofconservedmicrobialproteinfamilieswithnoknowndomains
AT pollardkatherines mostwantedlistofconservedmicrobialproteinfamilieswithnoknowndomains