Cargando…

Combining Distance Matrices on Identical Taxon Sets for Multi-Gene Analysis with Singular Value Decomposition

We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest...

Descripción completa

Detalles Bibliográficos
Autores principales: Abeysundera, Melanie, Kenney, Toby, Field, Chris, Gu, Hong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3986248/
https://www.ncbi.nlm.nih.gov/pubmed/24732341
http://dx.doi.org/10.1371/journal.pone.0094279
_version_ 1782311680901906432
author Abeysundera, Melanie
Kenney, Toby
Field, Chris
Gu, Hong
author_facet Abeysundera, Melanie
Kenney, Toby
Field, Chris
Gu, Hong
author_sort Abeysundera, Melanie
collection PubMed
description We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97), and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.
format Online
Article
Text
id pubmed-3986248
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39862482014-04-15 Combining Distance Matrices on Identical Taxon Sets for Multi-Gene Analysis with Singular Value Decomposition Abeysundera, Melanie Kenney, Toby Field, Chris Gu, Hong PLoS One Research Article We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97), and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree. Public Library of Science 2014-04-14 /pmc/articles/PMC3986248/ /pubmed/24732341 http://dx.doi.org/10.1371/journal.pone.0094279 Text en © 2014 Abeysundera et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Abeysundera, Melanie
Kenney, Toby
Field, Chris
Gu, Hong
Combining Distance Matrices on Identical Taxon Sets for Multi-Gene Analysis with Singular Value Decomposition
title Combining Distance Matrices on Identical Taxon Sets for Multi-Gene Analysis with Singular Value Decomposition
title_full Combining Distance Matrices on Identical Taxon Sets for Multi-Gene Analysis with Singular Value Decomposition
title_fullStr Combining Distance Matrices on Identical Taxon Sets for Multi-Gene Analysis with Singular Value Decomposition
title_full_unstemmed Combining Distance Matrices on Identical Taxon Sets for Multi-Gene Analysis with Singular Value Decomposition
title_short Combining Distance Matrices on Identical Taxon Sets for Multi-Gene Analysis with Singular Value Decomposition
title_sort combining distance matrices on identical taxon sets for multi-gene analysis with singular value decomposition
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3986248/
https://www.ncbi.nlm.nih.gov/pubmed/24732341
http://dx.doi.org/10.1371/journal.pone.0094279
work_keys_str_mv AT abeysunderamelanie combiningdistancematricesonidenticaltaxonsetsformultigeneanalysiswithsingularvaluedecomposition
AT kenneytoby combiningdistancematricesonidenticaltaxonsetsformultigeneanalysiswithsingularvaluedecomposition
AT fieldchris combiningdistancematricesonidenticaltaxonsetsformultigeneanalysiswithsingularvaluedecomposition
AT guhong combiningdistancematricesonidenticaltaxonsetsformultigeneanalysiswithsingularvaluedecomposition