Cargando…

Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees

BACKGROUND: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Dongying, Wu, Martin, Halpern, Aaron, Rusch, Douglas B., Yooseph, Shibu, Frazier, Marvin, Venter, J. Craig, Eisen, Jonathan A.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3060911/
https://www.ncbi.nlm.nih.gov/pubmed/21437252
http://dx.doi.org/10.1371/journal.pone.0018011
_version_ 1782200561934794752
author Wu, Dongying
Wu, Martin
Halpern, Aaron
Rusch, Douglas B.
Yooseph, Shibu
Frazier, Marvin
Venter, J. Craig
Eisen, Jonathan A.
author_facet Wu, Dongying
Wu, Martin
Halpern, Aaron
Rusch, Douglas B.
Yooseph, Shibu
Frazier, Marvin
Venter, J. Craig
Eisen, Jonathan A.
author_sort Wu, Dongying
collection PubMed
description BACKGROUND: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species. METHODOLOGY/PRINCIPAL FINDINGS: We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences. CONCLUSIONS/SIGNIFICANCE: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.
format Text
id pubmed-3060911
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30609112011-03-23 Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees Wu, Dongying Wu, Martin Halpern, Aaron Rusch, Douglas B. Yooseph, Shibu Frazier, Marvin Venter, J. Craig Eisen, Jonathan A. PLoS One Research Article BACKGROUND: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species. METHODOLOGY/PRINCIPAL FINDINGS: We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences. CONCLUSIONS/SIGNIFICANCE: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them. Public Library of Science 2011-03-18 /pmc/articles/PMC3060911/ /pubmed/21437252 http://dx.doi.org/10.1371/journal.pone.0018011 Text en This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Wu, Dongying
Wu, Martin
Halpern, Aaron
Rusch, Douglas B.
Yooseph, Shibu
Frazier, Marvin
Venter, J. Craig
Eisen, Jonathan A.
Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees
title Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees
title_full Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees
title_fullStr Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees
title_full_unstemmed Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees
title_short Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees
title_sort stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3060911/
https://www.ncbi.nlm.nih.gov/pubmed/21437252
http://dx.doi.org/10.1371/journal.pone.0018011
work_keys_str_mv AT wudongying stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT wumartin stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT halpernaaron stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT ruschdouglasb stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT yoosephshibu stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT fraziermarvin stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT venterjcraig stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT eisenjonathana stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees