Cargando…

Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences

BACKGROUND: Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute phylogenetic trees from all complete...

Descripción completa

Detalles Bibliográficos
Autores principales: Auch, Alexander F, Henz, Stefan R, Holland, Barbara R, Göker, Markus
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1564419/
https://www.ncbi.nlm.nih.gov/pubmed/16854218
http://dx.doi.org/10.1186/1471-2105-7-350
_version_ 1782129568697548800
author Auch, Alexander F
Henz, Stefan R
Holland, Barbara R
Göker, Markus
author_facet Auch, Alexander F
Henz, Stefan R
Holland, Barbara R
Göker, Markus
author_sort Auch, Alexander F
collection PubMed
description BACKGROUND: Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs) between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called δ value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study. RESULTS: Our results indicate that, at this taxonomic level, plastid genomes are much more valuable for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints are of little use. Distances based on the proportion of "matched" HSP length to average genome length were best for tree estimation. Additionally we found that using TBLASTX instead of BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and FastME performing insignificantly worse, and STC performing as well if applied to high quality distance matrices. δ values are found to be a reliable predictor of phylogenetic accuracy. CONCLUSION: Using the most treelike distance matrices, as judged by their δ values, distance methods are able to recover all major plant lineages, and are more in accordance with Apicomplexa organelles being derived from "green" plastids than from plastids of the "red" type. GBDP-like methods can be used to reliably infer phylogenies from different kinds of genomic data. A framework is established to further develop and improve such methods. δ values are a topology-independent tool of general use for the development and assessment of distance methods for phylogenetic inference.
format Text
id pubmed-1564419
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15644192006-09-14 Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences Auch, Alexander F Henz, Stefan R Holland, Barbara R Göker, Markus BMC Bioinformatics Research Article BACKGROUND: Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs) between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called δ value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study. RESULTS: Our results indicate that, at this taxonomic level, plastid genomes are much more valuable for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints are of little use. Distances based on the proportion of "matched" HSP length to average genome length were best for tree estimation. Additionally we found that using TBLASTX instead of BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and FastME performing insignificantly worse, and STC performing as well if applied to high quality distance matrices. δ values are found to be a reliable predictor of phylogenetic accuracy. CONCLUSION: Using the most treelike distance matrices, as judged by their δ values, distance methods are able to recover all major plant lineages, and are more in accordance with Apicomplexa organelles being derived from "green" plastids than from plastids of the "red" type. GBDP-like methods can be used to reliably infer phylogenies from different kinds of genomic data. A framework is established to further develop and improve such methods. δ values are a topology-independent tool of general use for the development and assessment of distance methods for phylogenetic inference. BioMed Central 2006-07-19 /pmc/articles/PMC1564419/ /pubmed/16854218 http://dx.doi.org/10.1186/1471-2105-7-350 Text en Copyright © 2006 Auch et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Auch, Alexander F
Henz, Stefan R
Holland, Barbara R
Göker, Markus
Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences
title Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences
title_full Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences
title_fullStr Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences
title_full_unstemmed Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences
title_short Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences
title_sort genome blast distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1564419/
https://www.ncbi.nlm.nih.gov/pubmed/16854218
http://dx.doi.org/10.1186/1471-2105-7-350
work_keys_str_mv AT auchalexanderf genomeblastdistancephylogeniesinferredfromwholeplastidandwholemitochondriongenomesequences
AT henzstefanr genomeblastdistancephylogeniesinferredfromwholeplastidandwholemitochondriongenomesequences
AT hollandbarbarar genomeblastdistancephylogeniesinferredfromwholeplastidandwholemitochondriongenomesequences
AT gokermarkus genomeblastdistancephylogeniesinferredfromwholeplastidandwholemitochondriongenomesequences