Cargando…

Remote homology and the functions of metagenomic dark matter

Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain ins...

Descripción completa

Detalles Bibliográficos
Autores principales: Lobb, Briallen, Kurtz, Daniel A., Moreno-Hagelsieb, Gabriel, Doxey, Andrew C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4508852/
https://www.ncbi.nlm.nih.gov/pubmed/26257768
http://dx.doi.org/10.3389/fgene.2015.00234
_version_ 1782382002155028480
author Lobb, Briallen
Kurtz, Daniel A.
Moreno-Hagelsieb, Gabriel
Doxey, Andrew C.
author_facet Lobb, Briallen
Kurtz, Daniel A.
Moreno-Hagelsieb, Gabriel
Doxey, Andrew C.
author_sort Lobb, Briallen
collection PubMed
description Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http://doxey.uwaterloo.ca/ORFans.
format Online
Article
Text
id pubmed-4508852
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-45088522015-08-07 Remote homology and the functions of metagenomic dark matter Lobb, Briallen Kurtz, Daniel A. Moreno-Hagelsieb, Gabriel Doxey, Andrew C. Front Genet Genetics Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http://doxey.uwaterloo.ca/ORFans. Frontiers Media S.A. 2015-07-21 /pmc/articles/PMC4508852/ /pubmed/26257768 http://dx.doi.org/10.3389/fgene.2015.00234 Text en Copyright © 2015 Lobb, Kurtz, Moreno-Hagelsieb and Doxey. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Lobb, Briallen
Kurtz, Daniel A.
Moreno-Hagelsieb, Gabriel
Doxey, Andrew C.
Remote homology and the functions of metagenomic dark matter
title Remote homology and the functions of metagenomic dark matter
title_full Remote homology and the functions of metagenomic dark matter
title_fullStr Remote homology and the functions of metagenomic dark matter
title_full_unstemmed Remote homology and the functions of metagenomic dark matter
title_short Remote homology and the functions of metagenomic dark matter
title_sort remote homology and the functions of metagenomic dark matter
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4508852/
https://www.ncbi.nlm.nih.gov/pubmed/26257768
http://dx.doi.org/10.3389/fgene.2015.00234
work_keys_str_mv AT lobbbriallen remotehomologyandthefunctionsofmetagenomicdarkmatter
AT kurtzdaniela remotehomologyandthefunctionsofmetagenomicdarkmatter
AT morenohagelsiebgabriel remotehomologyandthefunctionsofmetagenomicdarkmatter
AT doxeyandrewc remotehomologyandthefunctionsofmetagenomicdarkmatter