Cargando…

Automatic selection of representative proteins for bacterial phylogeny

BACKGROUND: Although there are now about 200 complete bacterial genomes in GenBank, deep bacterial phylogeny remains a difficult problem, due to confounding horizontal gene transfers and other phylogenetic "noise". Previous methods have relied primarily upon biological intuition or manual...

Descripción completa

Detalles Bibliográficos
Autores principales: Bern, Marshall, Goldberg, David
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175084/
https://www.ncbi.nlm.nih.gov/pubmed/15927057
http://dx.doi.org/10.1186/1471-2148-5-34
_version_ 1782124507849293824
author Bern, Marshall
Goldberg, David
author_facet Bern, Marshall
Goldberg, David
author_sort Bern, Marshall
collection PubMed
description BACKGROUND: Although there are now about 200 complete bacterial genomes in GenBank, deep bacterial phylogeny remains a difficult problem, due to confounding horizontal gene transfers and other phylogenetic "noise". Previous methods have relied primarily upon biological intuition or manual curation for choosing genomic sequences unlikely to be horizontally transferred, and have given inconsistent phylogenies with poor bootstrap confidence. RESULTS: We describe an algorithm that automatically picks "representative" protein families from entire genomes for use as phylogenetic characters. A representative protein family is one that, taken alone, gives an organismal distance matrix in good agreement with a distance matrix computed from all sufficiently conserved proteins. We then use maximum-likelihood methods to compute phylogenetic trees from a concatenation of representative sequences. We validate the use of representative proteins on a number of small phylogenetic questions with accepted answers. We then use our methodology to compute a robust and well-resolved phylogenetic tree for a diverse set of sequenced bacteria. The tree agrees closely with a recently published tree computed using manually curated proteins, and supports two proposed high-level clades: one containing Actinobacteria, Deinococcus, and Cyanobacteria ("Terrabacteria"), and another containing Planctomycetes and Chlamydiales. CONCLUSION: Representative proteins provide an effective solution to the problem of selecting phylogenetic characters.
format Text
id pubmed-1175084
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-11750842005-07-14 Automatic selection of representative proteins for bacterial phylogeny Bern, Marshall Goldberg, David BMC Evol Biol Research Article BACKGROUND: Although there are now about 200 complete bacterial genomes in GenBank, deep bacterial phylogeny remains a difficult problem, due to confounding horizontal gene transfers and other phylogenetic "noise". Previous methods have relied primarily upon biological intuition or manual curation for choosing genomic sequences unlikely to be horizontally transferred, and have given inconsistent phylogenies with poor bootstrap confidence. RESULTS: We describe an algorithm that automatically picks "representative" protein families from entire genomes for use as phylogenetic characters. A representative protein family is one that, taken alone, gives an organismal distance matrix in good agreement with a distance matrix computed from all sufficiently conserved proteins. We then use maximum-likelihood methods to compute phylogenetic trees from a concatenation of representative sequences. We validate the use of representative proteins on a number of small phylogenetic questions with accepted answers. We then use our methodology to compute a robust and well-resolved phylogenetic tree for a diverse set of sequenced bacteria. The tree agrees closely with a recently published tree computed using manually curated proteins, and supports two proposed high-level clades: one containing Actinobacteria, Deinococcus, and Cyanobacteria ("Terrabacteria"), and another containing Planctomycetes and Chlamydiales. CONCLUSION: Representative proteins provide an effective solution to the problem of selecting phylogenetic characters. BioMed Central 2005-05-31 /pmc/articles/PMC1175084/ /pubmed/15927057 http://dx.doi.org/10.1186/1471-2148-5-34 Text en Copyright © 2005 Bern and Goldberg; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bern, Marshall
Goldberg, David
Automatic selection of representative proteins for bacterial phylogeny
title Automatic selection of representative proteins for bacterial phylogeny
title_full Automatic selection of representative proteins for bacterial phylogeny
title_fullStr Automatic selection of representative proteins for bacterial phylogeny
title_full_unstemmed Automatic selection of representative proteins for bacterial phylogeny
title_short Automatic selection of representative proteins for bacterial phylogeny
title_sort automatic selection of representative proteins for bacterial phylogeny
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175084/
https://www.ncbi.nlm.nih.gov/pubmed/15927057
http://dx.doi.org/10.1186/1471-2148-5-34
work_keys_str_mv AT bernmarshall automaticselectionofrepresentativeproteinsforbacterialphylogeny
AT goldbergdavid automaticselectionofrepresentativeproteinsforbacterialphylogeny