Cargando…

Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise

BACKGROUND: The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood (ML) on characters aligned either pairwise or jointly in a multiple sequence alignment (MSA). Estimators for the covariance of pair...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dessimoz, Christophe, Gil, Manuel
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2443136/ https://www.ncbi.nlm.nih.gov/pubmed/18573206 http://dx.doi.org/10.1186/1471-2148-8-179

_version_	1782156798793351168
author	Dessimoz, Christophe Gil, Manuel
author_facet	Dessimoz, Christophe Gil, Manuel
author_sort	Dessimoz, Christophe
collection	PubMed
description	BACKGROUND: The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood (ML) on characters aligned either pairwise or jointly in a multiple sequence alignment (MSA). Estimators for the covariance of pairs from an MSA are known, but we are not aware of any solution for cases of pairs aligned independently. In large-scale analyses, it may be too costly to compute MSAs every time distances must be compared, and therefore a covariance estimator for distances estimated from pairs aligned independently is desirable. Knowledge of covariances improves any process that compares or combines distances, such as in generalized least-squares phylogenetic tree building, orthology inference, or lateral gene transfer detection. RESULTS: In this paper, we introduce an estimator for the covariance of distances from sequences aligned pairwise. Its performance is analyzed through extensive Monte Carlo simulations, and compared to the well-known variance estimator of ML distances. Our covariance estimator can be used together with the ML variance estimator to form covariance matrices. CONCLUSION: The estimator performs similarly to the ML variance estimator. In particular, it shows no sign of bias when sequence divergence is below 150 PAM units (i.e. above ~29% expected sequence identity). Above that distance, the covariances tend to be underestimated, but then ML variances are also underestimated.
format	Text
id	pubmed-2443136
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24431362008-07-07 Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise Dessimoz, Christophe Gil, Manuel BMC Evol Biol Methodology Article BACKGROUND: The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood (ML) on characters aligned either pairwise or jointly in a multiple sequence alignment (MSA). Estimators for the covariance of pairs from an MSA are known, but we are not aware of any solution for cases of pairs aligned independently. In large-scale analyses, it may be too costly to compute MSAs every time distances must be compared, and therefore a covariance estimator for distances estimated from pairs aligned independently is desirable. Knowledge of covariances improves any process that compares or combines distances, such as in generalized least-squares phylogenetic tree building, orthology inference, or lateral gene transfer detection. RESULTS: In this paper, we introduce an estimator for the covariance of distances from sequences aligned pairwise. Its performance is analyzed through extensive Monte Carlo simulations, and compared to the well-known variance estimator of ML distances. Our covariance estimator can be used together with the ML variance estimator to form covariance matrices. CONCLUSION: The estimator performs similarly to the ML variance estimator. In particular, it shows no sign of bias when sequence divergence is below 150 PAM units (i.e. above ~29% expected sequence identity). Above that distance, the covariances tend to be underestimated, but then ML variances are also underestimated. BioMed Central 2008-06-23 /pmc/articles/PMC2443136/ /pubmed/18573206 http://dx.doi.org/10.1186/1471-2148-8-179 Text en Copyright ©2008 Dessimoz and Gil; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Dessimoz, Christophe Gil, Manuel Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title	Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_full	Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_fullStr	Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_full_unstemmed	Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_short	Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
title_sort	covariance of maximum likelihood evolutionary distances between sequences aligned pairwise
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2443136/ https://www.ncbi.nlm.nih.gov/pubmed/18573206 http://dx.doi.org/10.1186/1471-2148-8-179
work_keys_str_mv	AT dessimozchristophe covarianceofmaximumlikelihoodevolutionarydistancesbetweensequencesalignedpairwise AT gilmanuel covarianceofmaximumlikelihoodevolutionarydistancesbetweensequencesalignedpairwise

Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise

Ejemplares similares