Cargando…

Whole genome phylogenies for multiple Drosophila species

BACKGROUND: Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alig...

Descripción completa

Detalles Bibliográficos
Autores principales: Seetharam, Arun, Stuart, Gary W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531268/
https://www.ncbi.nlm.nih.gov/pubmed/23210901
http://dx.doi.org/10.1186/1756-0500-5-670
_version_ 1782254144908689408
author Seetharam, Arun
Stuart, Gary W
author_facet Seetharam, Arun
Stuart, Gary W
author_sort Seetharam, Arun
collection PubMed
description BACKGROUND: Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD) to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. RESULTS: An unfiltered whole genome analysis (193,622 predicted proteins) strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. CONCLUSIONS: These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between Drosophila species. Furthermore, protein filtering can be effectively applied to reduce incongruence in the dataset as well as to generate alternative phylogenies.
format Online
Article
Text
id pubmed-3531268
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35312682013-01-03 Whole genome phylogenies for multiple Drosophila species Seetharam, Arun Stuart, Gary W BMC Res Notes Research Article BACKGROUND: Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD) to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. RESULTS: An unfiltered whole genome analysis (193,622 predicted proteins) strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. CONCLUSIONS: These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between Drosophila species. Furthermore, protein filtering can be effectively applied to reduce incongruence in the dataset as well as to generate alternative phylogenies. BioMed Central 2012-12-04 /pmc/articles/PMC3531268/ /pubmed/23210901 http://dx.doi.org/10.1186/1756-0500-5-670 Text en Copyright ©2012 Seetharam and Stuart; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Seetharam, Arun
Stuart, Gary W
Whole genome phylogenies for multiple Drosophila species
title Whole genome phylogenies for multiple Drosophila species
title_full Whole genome phylogenies for multiple Drosophila species
title_fullStr Whole genome phylogenies for multiple Drosophila species
title_full_unstemmed Whole genome phylogenies for multiple Drosophila species
title_short Whole genome phylogenies for multiple Drosophila species
title_sort whole genome phylogenies for multiple drosophila species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531268/
https://www.ncbi.nlm.nih.gov/pubmed/23210901
http://dx.doi.org/10.1186/1756-0500-5-670
work_keys_str_mv AT seetharamarun wholegenomephylogeniesformultipledrosophilaspecies
AT stuartgaryw wholegenomephylogeniesformultipledrosophilaspecies