Cargando…

Comparison of methods for estimating the nucleotide substitution matrix

BACKGROUND: The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matri...

Descripción completa

Detalles Bibliográficos
Autores principales: Oscamou, Maribeth, McDonald, Daniel, Yap, Von Bing, Huttley, Gavin A, Lladser, Manuel E, Knight, Rob
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655096/
https://www.ncbi.nlm.nih.gov/pubmed/19046431
http://dx.doi.org/10.1186/1471-2105-9-511
_version_ 1782165437740482560
author Oscamou, Maribeth
McDonald, Daniel
Yap, Von Bing
Huttley, Gavin A
Lladser, Manuel E
Knight, Rob
author_facet Oscamou, Maribeth
McDonald, Daniel
Yap, Von Bing
Huttley, Gavin A
Lladser, Manuel E
Knight, Rob
author_sort Oscamou, Maribeth
collection PubMed
description BACKGROUND: The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared. RESULTS: Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short. CONCLUSION: Based on the conditions tested, we recommend the use of method of Gojobori et al. (1982) for long sequences (> 600 nucleotides), and the method of Goldman et al. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life.
format Text
id pubmed-2655096
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26550962009-03-17 Comparison of methods for estimating the nucleotide substitution matrix Oscamou, Maribeth McDonald, Daniel Yap, Von Bing Huttley, Gavin A Lladser, Manuel E Knight, Rob BMC Bioinformatics Research Article BACKGROUND: The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared. RESULTS: Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short. CONCLUSION: Based on the conditions tested, we recommend the use of method of Gojobori et al. (1982) for long sequences (> 600 nucleotides), and the method of Goldman et al. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life. BioMed Central 2008-12-01 /pmc/articles/PMC2655096/ /pubmed/19046431 http://dx.doi.org/10.1186/1471-2105-9-511 Text en Copyright © 2008 Oscamou et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Oscamou, Maribeth
McDonald, Daniel
Yap, Von Bing
Huttley, Gavin A
Lladser, Manuel E
Knight, Rob
Comparison of methods for estimating the nucleotide substitution matrix
title Comparison of methods for estimating the nucleotide substitution matrix
title_full Comparison of methods for estimating the nucleotide substitution matrix
title_fullStr Comparison of methods for estimating the nucleotide substitution matrix
title_full_unstemmed Comparison of methods for estimating the nucleotide substitution matrix
title_short Comparison of methods for estimating the nucleotide substitution matrix
title_sort comparison of methods for estimating the nucleotide substitution matrix
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655096/
https://www.ncbi.nlm.nih.gov/pubmed/19046431
http://dx.doi.org/10.1186/1471-2105-9-511
work_keys_str_mv AT oscamoumaribeth comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT mcdonalddaniel comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT yapvonbing comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT huttleygavina comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT lladsermanuele comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix
AT knightrob comparisonofmethodsforestimatingthenucleotidesubstitutionmatrix