Cargando…

Direct maximum parsimony phylogeny reconstruction from genotype data

BACKGROUND: Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are availa...

Descripción completa

Detalles Bibliográficos
Autores principales: Sridhar, Srinath, Lam, Fumei, Blelloch, Guy E, Ravi, R, Schwartz, Russell
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2222657/
https://www.ncbi.nlm.nih.gov/pubmed/18053244
http://dx.doi.org/10.1186/1471-2105-8-472
_version_ 1782149366120710144
author Sridhar, Srinath
Lam, Fumei
Blelloch, Guy E
Ravi, R
Schwartz, Russell
author_facet Sridhar, Srinath
Lam, Fumei
Blelloch, Guy E
Ravi, R
Schwartz, Russell
author_sort Sridhar, Srinath
collection PubMed
description BACKGROUND: Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes. RESULTS: In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes. CONCLUSION: Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone.
format Text
id pubmed-2222657
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22226572008-02-02 Direct maximum parsimony phylogeny reconstruction from genotype data Sridhar, Srinath Lam, Fumei Blelloch, Guy E Ravi, R Schwartz, Russell BMC Bioinformatics Research Article BACKGROUND: Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes. RESULTS: In this work, we develop the first practical method for computing maximum parsimony phylogenies directly from genotype data. We show that the standard practice of first inferring haplotypes from genotypes and then reconstructing a phylogeny on the haplotypes often substantially overestimates phylogeny size. As an immediate application, our method can be used to determine the minimum number of mutations required to explain a given set of observed genotypes. CONCLUSION: Phylogeny reconstruction directly from unphased data is computationally feasible for moderate-sized problem instances and can lead to substantially more accurate tree size inferences than the standard practice of treating phasing and phylogeny construction as two separate analysis stages. The difference between the approaches is particularly important for downstream applications that require a lower-bound on the number of mutations that the genetic region has undergone. BioMed Central 2007-12-05 /pmc/articles/PMC2222657/ /pubmed/18053244 http://dx.doi.org/10.1186/1471-2105-8-472 Text en Copyright © 2007 Sridhar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Sridhar, Srinath
Lam, Fumei
Blelloch, Guy E
Ravi, R
Schwartz, Russell
Direct maximum parsimony phylogeny reconstruction from genotype data
title Direct maximum parsimony phylogeny reconstruction from genotype data
title_full Direct maximum parsimony phylogeny reconstruction from genotype data
title_fullStr Direct maximum parsimony phylogeny reconstruction from genotype data
title_full_unstemmed Direct maximum parsimony phylogeny reconstruction from genotype data
title_short Direct maximum parsimony phylogeny reconstruction from genotype data
title_sort direct maximum parsimony phylogeny reconstruction from genotype data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2222657/
https://www.ncbi.nlm.nih.gov/pubmed/18053244
http://dx.doi.org/10.1186/1471-2105-8-472
work_keys_str_mv AT sridharsrinath directmaximumparsimonyphylogenyreconstructionfromgenotypedata
AT lamfumei directmaximumparsimonyphylogenyreconstructionfromgenotypedata
AT blellochguye directmaximumparsimonyphylogenyreconstructionfromgenotypedata
AT ravir directmaximumparsimonyphylogenyreconstructionfromgenotypedata
AT schwartzrussell directmaximumparsimonyphylogenyreconstructionfromgenotypedata