Cargando…

Telling the whole story in a 10,000-genome world

BACKGROUND: Genome sequencing has revolutionized our view of the relationships among genomes, particularly in revealing the confounding effects of lateral genetic transfer (LGT). Phylogenomic techniques have been used to construct purported trees of microbial life. Although such trees are easily int...

Descripción completa

Detalles Bibliográficos
Autor principal: Beiko, Robert G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3158115/
https://www.ncbi.nlm.nih.gov/pubmed/21714939
http://dx.doi.org/10.1186/1745-6150-6-34
_version_ 1782210365021487104
author Beiko, Robert G
author_facet Beiko, Robert G
author_sort Beiko, Robert G
collection PubMed
description BACKGROUND: Genome sequencing has revolutionized our view of the relationships among genomes, particularly in revealing the confounding effects of lateral genetic transfer (LGT). Phylogenomic techniques have been used to construct purported trees of microbial life. Although such trees are easily interpreted and allow the use of a subset of genomes as "proxies" for the full set, LGT and other phenomena impact the positioning of different groups in genome trees, confounding and potentially invalidating attempts to construct a phylogeny-based taxonomy of microorganisms. Network and graph approaches can reveal complex sets of relationships, but applying these techniques to large data sets is a significant challenge. Notwithstanding the question of what exactly it might represent, generating and interpreting a Tree or Network of All Genomes will only be feasible if current algorithms can be improved upon. RESULTS: Complex relationships among even the most-similar genomes demonstrate that proxy-based approaches to simplifying large sets of genomes are not alone sufficient to solve the analysis problem. A phylogenomic analysis of 1173 sequenced bacterial and archaeal genomes generated phylogenetic trees for 159,905 distinct homologous gene sets. The relationships inferred from this set can be heavily dependent on the inclusion of other taxa: for example, phyla such as Spirochaetes, Proteobacteria and Firmicutes are recovered as cohesive groups or split depending on the presence of other specific lineages. Furthermore, named groups such as Acidithiobacillus, Coprothermobacter and Brachyspira show a multitude of affiliations that are more consistent with their ecology than with small subunit ribosomal DNA-based taxonomy. Network and graph representations can illustrate the multitude of conflicting affinities, but all methods impose constraints on the input data and create challenges of construction and interpretation. CONCLUSIONS: These complex relationships highlight the need for an inclusive approach to genomic data, and current methods with minor alterations will likely scale to allow the analysis of data sets with 10,000 or more genomes. The main challenges lie in the visualization and interpretation of genomic relationships, and the redefinition of microbial taxonomy when subsets of genomic data are so evidently in conflict with one another, and with the "canonical" molecular taxonomy. REVIEWERS: The manuscript was reviewed by William Martin, W. Ford Doolittle, Joel Velasco and Eugene Koonin.
format Online
Article
Text
id pubmed-3158115
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31581152011-08-19 Telling the whole story in a 10,000-genome world Beiko, Robert G Biol Direct Research BACKGROUND: Genome sequencing has revolutionized our view of the relationships among genomes, particularly in revealing the confounding effects of lateral genetic transfer (LGT). Phylogenomic techniques have been used to construct purported trees of microbial life. Although such trees are easily interpreted and allow the use of a subset of genomes as "proxies" for the full set, LGT and other phenomena impact the positioning of different groups in genome trees, confounding and potentially invalidating attempts to construct a phylogeny-based taxonomy of microorganisms. Network and graph approaches can reveal complex sets of relationships, but applying these techniques to large data sets is a significant challenge. Notwithstanding the question of what exactly it might represent, generating and interpreting a Tree or Network of All Genomes will only be feasible if current algorithms can be improved upon. RESULTS: Complex relationships among even the most-similar genomes demonstrate that proxy-based approaches to simplifying large sets of genomes are not alone sufficient to solve the analysis problem. A phylogenomic analysis of 1173 sequenced bacterial and archaeal genomes generated phylogenetic trees for 159,905 distinct homologous gene sets. The relationships inferred from this set can be heavily dependent on the inclusion of other taxa: for example, phyla such as Spirochaetes, Proteobacteria and Firmicutes are recovered as cohesive groups or split depending on the presence of other specific lineages. Furthermore, named groups such as Acidithiobacillus, Coprothermobacter and Brachyspira show a multitude of affiliations that are more consistent with their ecology than with small subunit ribosomal DNA-based taxonomy. Network and graph representations can illustrate the multitude of conflicting affinities, but all methods impose constraints on the input data and create challenges of construction and interpretation. CONCLUSIONS: These complex relationships highlight the need for an inclusive approach to genomic data, and current methods with minor alterations will likely scale to allow the analysis of data sets with 10,000 or more genomes. The main challenges lie in the visualization and interpretation of genomic relationships, and the redefinition of microbial taxonomy when subsets of genomic data are so evidently in conflict with one another, and with the "canonical" molecular taxonomy. REVIEWERS: The manuscript was reviewed by William Martin, W. Ford Doolittle, Joel Velasco and Eugene Koonin. BioMed Central 2011-06-30 /pmc/articles/PMC3158115/ /pubmed/21714939 http://dx.doi.org/10.1186/1745-6150-6-34 Text en Copyright ©2011 Beiko; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Beiko, Robert G
Telling the whole story in a 10,000-genome world
title Telling the whole story in a 10,000-genome world
title_full Telling the whole story in a 10,000-genome world
title_fullStr Telling the whole story in a 10,000-genome world
title_full_unstemmed Telling the whole story in a 10,000-genome world
title_short Telling the whole story in a 10,000-genome world
title_sort telling the whole story in a 10,000-genome world
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3158115/
https://www.ncbi.nlm.nih.gov/pubmed/21714939
http://dx.doi.org/10.1186/1745-6150-6-34
work_keys_str_mv AT beikorobertg tellingthewholestoryina10000genomeworld