Cargando…
Automatic selection of representative proteins for bacterial phylogeny
BACKGROUND: Although there are now about 200 complete bacterial genomes in GenBank, deep bacterial phylogeny remains a difficult problem, due to confounding horizontal gene transfers and other phylogenetic "noise". Previous methods have relied primarily upon biological intuition or manual...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175084/ https://www.ncbi.nlm.nih.gov/pubmed/15927057 http://dx.doi.org/10.1186/1471-2148-5-34 |
_version_ | 1782124507849293824 |
---|---|
author | Bern, Marshall Goldberg, David |
author_facet | Bern, Marshall Goldberg, David |
author_sort | Bern, Marshall |
collection | PubMed |
description | BACKGROUND: Although there are now about 200 complete bacterial genomes in GenBank, deep bacterial phylogeny remains a difficult problem, due to confounding horizontal gene transfers and other phylogenetic "noise". Previous methods have relied primarily upon biological intuition or manual curation for choosing genomic sequences unlikely to be horizontally transferred, and have given inconsistent phylogenies with poor bootstrap confidence. RESULTS: We describe an algorithm that automatically picks "representative" protein families from entire genomes for use as phylogenetic characters. A representative protein family is one that, taken alone, gives an organismal distance matrix in good agreement with a distance matrix computed from all sufficiently conserved proteins. We then use maximum-likelihood methods to compute phylogenetic trees from a concatenation of representative sequences. We validate the use of representative proteins on a number of small phylogenetic questions with accepted answers. We then use our methodology to compute a robust and well-resolved phylogenetic tree for a diverse set of sequenced bacteria. The tree agrees closely with a recently published tree computed using manually curated proteins, and supports two proposed high-level clades: one containing Actinobacteria, Deinococcus, and Cyanobacteria ("Terrabacteria"), and another containing Planctomycetes and Chlamydiales. CONCLUSION: Representative proteins provide an effective solution to the problem of selecting phylogenetic characters. |
format | Text |
id | pubmed-1175084 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-11750842005-07-14 Automatic selection of representative proteins for bacterial phylogeny Bern, Marshall Goldberg, David BMC Evol Biol Research Article BACKGROUND: Although there are now about 200 complete bacterial genomes in GenBank, deep bacterial phylogeny remains a difficult problem, due to confounding horizontal gene transfers and other phylogenetic "noise". Previous methods have relied primarily upon biological intuition or manual curation for choosing genomic sequences unlikely to be horizontally transferred, and have given inconsistent phylogenies with poor bootstrap confidence. RESULTS: We describe an algorithm that automatically picks "representative" protein families from entire genomes for use as phylogenetic characters. A representative protein family is one that, taken alone, gives an organismal distance matrix in good agreement with a distance matrix computed from all sufficiently conserved proteins. We then use maximum-likelihood methods to compute phylogenetic trees from a concatenation of representative sequences. We validate the use of representative proteins on a number of small phylogenetic questions with accepted answers. We then use our methodology to compute a robust and well-resolved phylogenetic tree for a diverse set of sequenced bacteria. The tree agrees closely with a recently published tree computed using manually curated proteins, and supports two proposed high-level clades: one containing Actinobacteria, Deinococcus, and Cyanobacteria ("Terrabacteria"), and another containing Planctomycetes and Chlamydiales. CONCLUSION: Representative proteins provide an effective solution to the problem of selecting phylogenetic characters. BioMed Central 2005-05-31 /pmc/articles/PMC1175084/ /pubmed/15927057 http://dx.doi.org/10.1186/1471-2148-5-34 Text en Copyright © 2005 Bern and Goldberg; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Bern, Marshall Goldberg, David Automatic selection of representative proteins for bacterial phylogeny |
title | Automatic selection of representative proteins for bacterial phylogeny |
title_full | Automatic selection of representative proteins for bacterial phylogeny |
title_fullStr | Automatic selection of representative proteins for bacterial phylogeny |
title_full_unstemmed | Automatic selection of representative proteins for bacterial phylogeny |
title_short | Automatic selection of representative proteins for bacterial phylogeny |
title_sort | automatic selection of representative proteins for bacterial phylogeny |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175084/ https://www.ncbi.nlm.nih.gov/pubmed/15927057 http://dx.doi.org/10.1186/1471-2148-5-34 |
work_keys_str_mv | AT bernmarshall automaticselectionofrepresentativeproteinsforbacterialphylogeny AT goldbergdavid automaticselectionofrepresentativeproteinsforbacterialphylogeny |