Cargando…

Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data

BACKGROUND: Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype str...

Descripción completa

Detalles Bibliográficos
Autores principales: Iliadis, Alexandros, Anastassiou, Dimitris, Wang, Xiaodong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3560217/
https://www.ncbi.nlm.nih.gov/pubmed/23110720
http://dx.doi.org/10.1186/1471-2156-13-94
_version_ 1782257759242158080
author Iliadis, Alexandros
Anastassiou, Dimitris
Wang, Xiaodong
author_facet Iliadis, Alexandros
Anastassiou, Dimitris
Wang, Xiaodong
author_sort Iliadis, Alexandros
collection PubMed
description BACKGROUND: Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data. RESULTS: We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at http://www.ee.columbia.edu/~anastas/tdspool. CONCLUSIONS: Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets.
format Online
Article
Text
id pubmed-3560217
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35602172013-02-04 Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data Iliadis, Alexandros Anastassiou, Dimitris Wang, Xiaodong BMC Genet Research Article BACKGROUND: Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses. Several methods have been proposed for estimating haplotype frequencies in a population from pooled DNA data. RESULTS: We introduce a technique for haplotype frequency estimation in a population from pooled DNA samples focusing on datasets containing a small number of individuals per pool (2 or 3 individuals) and a large number of markers. We compare our method with the publicly available state-of-the-art algorithms HIPPO and HAPLOPOOL on datasets of varying number of pools and marker sizes. We demonstrate that our algorithm provides improvements in terms of accuracy and computational time over competing methods for large number of markers while demonstrating comparable performance for smaller marker sizes. Our method is implemented in the "Tree-Based Deterministic Sampling Pool" (TDSPool) package which is available for download at http://www.ee.columbia.edu/~anastas/tdspool. CONCLUSIONS: Using a tree-based determinstic sampling technique we present an algorithm for haplotype frequency estimation from pooled data. Our method demonstrates superior performance in datasets with large number of markers and could be the method of choice for haplotype frequency estimation in such datasets. BioMed Central 2012-10-30 /pmc/articles/PMC3560217/ /pubmed/23110720 http://dx.doi.org/10.1186/1471-2156-13-94 Text en Copyright ©2012 Iliadis et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Iliadis, Alexandros
Anastassiou, Dimitris
Wang, Xiaodong
Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data
title Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data
title_full Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data
title_fullStr Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data
title_full_unstemmed Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data
title_short Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data
title_sort fast and accurate haplotype frequency estimation for large haplotype vectors from pooled dna data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3560217/
https://www.ncbi.nlm.nih.gov/pubmed/23110720
http://dx.doi.org/10.1186/1471-2156-13-94
work_keys_str_mv AT iliadisalexandros fastandaccuratehaplotypefrequencyestimationforlargehaplotypevectorsfrompooleddnadata
AT anastassioudimitris fastandaccuratehaplotypefrequencyestimationforlargehaplotypevectorsfrompooleddnadata
AT wangxiaodong fastandaccuratehaplotypefrequencyestimationforlargehaplotypevectorsfrompooleddnadata