Cargando…

A support vector machine based test for incongruence between sets of trees in tree space

BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes...

Descripción completa

Detalles Bibliográficos
Autores principales: Haws, David C, Huggins, Peter, O’Neill, Eric M, Weisrock, David W, Yoshida, Ruriko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536566/
https://www.ncbi.nlm.nih.gov/pubmed/22909268
http://dx.doi.org/10.1186/1471-2105-13-210
_version_ 1782254758121177088
author Haws, David C
Huggins, Peter
O’Neill, Eric M
Weisrock, David W
Yoshida, Ruriko
author_facet Haws, David C
Huggins, Peter
O’Neill, Eric M
Weisrock, David W
Yoshida, Ruriko
author_sort Haws, David C
collection PubMed
description BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license.
format Online
Article
Text
id pubmed-3536566
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35365662013-01-08 A support vector machine based test for incongruence between sets of trees in tree space Haws, David C Huggins, Peter O’Neill, Eric M Weisrock, David W Yoshida, Ruriko BMC Bioinformatics Research Article BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license. BioMed Central 2012-08-21 /pmc/articles/PMC3536566/ /pubmed/22909268 http://dx.doi.org/10.1186/1471-2105-13-210 Text en Copyright ©2012 Haws et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Haws, David C
Huggins, Peter
O’Neill, Eric M
Weisrock, David W
Yoshida, Ruriko
A support vector machine based test for incongruence between sets of trees in tree space
title A support vector machine based test for incongruence between sets of trees in tree space
title_full A support vector machine based test for incongruence between sets of trees in tree space
title_fullStr A support vector machine based test for incongruence between sets of trees in tree space
title_full_unstemmed A support vector machine based test for incongruence between sets of trees in tree space
title_short A support vector machine based test for incongruence between sets of trees in tree space
title_sort support vector machine based test for incongruence between sets of trees in tree space
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536566/
https://www.ncbi.nlm.nih.gov/pubmed/22909268
http://dx.doi.org/10.1186/1471-2105-13-210
work_keys_str_mv AT hawsdavidc asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace
AT hugginspeter asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace
AT oneillericm asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace
AT weisrockdavidw asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace
AT yoshidaruriko asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace
AT hawsdavidc supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace
AT hugginspeter supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace
AT oneillericm supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace
AT weisrockdavidw supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace
AT yoshidaruriko supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace