Cargando…
A support vector machine based test for incongruence between sets of trees in tree space
BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536566/ https://www.ncbi.nlm.nih.gov/pubmed/22909268 http://dx.doi.org/10.1186/1471-2105-13-210 |
_version_ | 1782254758121177088 |
---|---|
author | Haws, David C Huggins, Peter O’Neill, Eric M Weisrock, David W Yoshida, Ruriko |
author_facet | Haws, David C Huggins, Peter O’Neill, Eric M Weisrock, David W Yoshida, Ruriko |
author_sort | Haws, David C |
collection | PubMed |
description | BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license. |
format | Online Article Text |
id | pubmed-3536566 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35365662013-01-08 A support vector machine based test for incongruence between sets of trees in tree space Haws, David C Huggins, Peter O’Neill, Eric M Weisrock, David W Yoshida, Ruriko BMC Bioinformatics Research Article BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license. BioMed Central 2012-08-21 /pmc/articles/PMC3536566/ /pubmed/22909268 http://dx.doi.org/10.1186/1471-2105-13-210 Text en Copyright ©2012 Haws et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Haws, David C Huggins, Peter O’Neill, Eric M Weisrock, David W Yoshida, Ruriko A support vector machine based test for incongruence between sets of trees in tree space |
title | A support vector machine based test for incongruence between sets of trees in tree space |
title_full | A support vector machine based test for incongruence between sets of trees in tree space |
title_fullStr | A support vector machine based test for incongruence between sets of trees in tree space |
title_full_unstemmed | A support vector machine based test for incongruence between sets of trees in tree space |
title_short | A support vector machine based test for incongruence between sets of trees in tree space |
title_sort | support vector machine based test for incongruence between sets of trees in tree space |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3536566/ https://www.ncbi.nlm.nih.gov/pubmed/22909268 http://dx.doi.org/10.1186/1471-2105-13-210 |
work_keys_str_mv | AT hawsdavidc asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace AT hugginspeter asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace AT oneillericm asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace AT weisrockdavidw asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace AT yoshidaruriko asupportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace AT hawsdavidc supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace AT hugginspeter supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace AT oneillericm supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace AT weisrockdavidw supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace AT yoshidaruriko supportvectormachinebasedtestforincongruencebetweensetsoftreesintreespace |