Cargando…

A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis

In this article we address the problem of phylogenetic inference from nucleic acid data containing missing bases. We introduce a new effective approach, called “Probabilistic estimation of missing values” (PEMV), allowing one to estimate unknown nucleotides prior to computing the evolutionary distan...

Descripción completa

Detalles Bibliográficos
Autores principales:	Diallo, Abdoulaye Baniré, Lapointe, François-Joseph, Makarenkov, Vladimir
Formato:	Texto
Lenguaje:	English
Publicado:	Libertas Academica 2007
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2674658/ https://www.ncbi.nlm.nih.gov/pubmed/19455216

_version_	1782166660137877504
author	Diallo, Abdoulaye Baniré Lapointe, François-Joseph Makarenkov, Vladimir
author_facet	Diallo, Abdoulaye Baniré Lapointe, François-Joseph Makarenkov, Vladimir
author_sort	Diallo, Abdoulaye Baniré
collection	PubMed
description	In this article we address the problem of phylogenetic inference from nucleic acid data containing missing bases. We introduce a new effective approach, called “Probabilistic estimation of missing values” (PEMV), allowing one to estimate unknown nucleotides prior to computing the evolutionary distances between them. We show that the new method improves the accuracy of phylogenetic inference compared to the existing methods “Ignoring Missing Sites” (IMS), “Proportional Distribution of Missing and Ambiguous Bases” (PDMAB) included in the PAUP software [26]. The proposed strategy for estimating missing nucleotides is based on probabilistic formulae developed in the framework of the Jukes-Cantor [10] and Kimura 2-parameter [11] models. The relative performances of the new method were assessed through simulations carried out with the SeqGen program [20], for data generation, and the Bio NJ method [7], for inferring phylogenies. We also compared the new method to the DNAML program [5] and “Matrix Representation using Parsimony” (MRP) [13], [19] considering an example of 66 eutherian mammals originally analyzed in [17].
format	Text
id	pubmed-2674658
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-26746582009-05-19 A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis Diallo, Abdoulaye Baniré Lapointe, François-Joseph Makarenkov, Vladimir Evol Bioinform Online Original Research In this article we address the problem of phylogenetic inference from nucleic acid data containing missing bases. We introduce a new effective approach, called “Probabilistic estimation of missing values” (PEMV), allowing one to estimate unknown nucleotides prior to computing the evolutionary distances between them. We show that the new method improves the accuracy of phylogenetic inference compared to the existing methods “Ignoring Missing Sites” (IMS), “Proportional Distribution of Missing and Ambiguous Bases” (PDMAB) included in the PAUP software [26]. The proposed strategy for estimating missing nucleotides is based on probabilistic formulae developed in the framework of the Jukes-Cantor [10] and Kimura 2-parameter [11] models. The relative performances of the new method were assessed through simulations carried out with the SeqGen program [20], for data generation, and the Bio NJ method [7], for inferring phylogenies. We also compared the new method to the DNAML program [5] and “Matrix Representation using Parsimony” (MRP) [13], [19] considering an example of 66 eutherian mammals originally analyzed in [17]. Libertas Academica 2007-02-01 /pmc/articles/PMC2674658/ /pubmed/19455216 Text en Copyright © 2006 The authors. http://creativecommons.org/licenses/by/3.0 This article is published under the Creative Commons Attribution By licence. For further information go to: http://creativecommons.org/licenses/by/3.0. (http://creativecommons.org/licenses/by/3.0)
spellingShingle	Original Research Diallo, Abdoulaye Baniré Lapointe, François-Joseph Makarenkov, Vladimir A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis
title	A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis
title_full	A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis
title_fullStr	A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis
title_full_unstemmed	A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis
title_short	A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis
title_sort	new effective method for estimating missing values in the sequence data prior to phylogenetic analysis
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2674658/ https://www.ncbi.nlm.nih.gov/pubmed/19455216
work_keys_str_mv	AT dialloabdoulayebanire aneweffectivemethodforestimatingmissingvaluesinthesequencedatapriortophylogeneticanalysis AT lapointefrancoisjoseph aneweffectivemethodforestimatingmissingvaluesinthesequencedatapriortophylogeneticanalysis AT makarenkovvladimir aneweffectivemethodforestimatingmissingvaluesinthesequencedatapriortophylogeneticanalysis AT dialloabdoulayebanire neweffectivemethodforestimatingmissingvaluesinthesequencedatapriortophylogeneticanalysis AT lapointefrancoisjoseph neweffectivemethodforestimatingmissingvaluesinthesequencedatapriortophylogeneticanalysis AT makarenkovvladimir neweffectivemethodforestimatingmissingvaluesinthesequencedatapriortophylogeneticanalysis

A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis

Ejemplares similares