Cargando…

Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA

BACKGROUND: DNA pooling constitutes a cost effective alternative in genome wide association studies. In DNA pooling, equimolar amounts of DNA from different individuals are mixed into one sample and the frequency of each allele in each position is observed in a single genotype experiment. The identi...

Descripción completa

Detalles Bibliográficos
Autores principales: Jajamovich, Guido H, Iliadis, Alexandros, Anastassiou, Dimitris, Wang, Xiaodong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3847492/
https://www.ncbi.nlm.nih.gov/pubmed/24010487
http://dx.doi.org/10.1186/1471-2105-14-270
_version_ 1782293612038455296
author Jajamovich, Guido H
Iliadis, Alexandros
Anastassiou, Dimitris
Wang, Xiaodong
author_facet Jajamovich, Guido H
Iliadis, Alexandros
Anastassiou, Dimitris
Wang, Xiaodong
author_sort Jajamovich, Guido H
collection PubMed
description BACKGROUND: DNA pooling constitutes a cost effective alternative in genome wide association studies. In DNA pooling, equimolar amounts of DNA from different individuals are mixed into one sample and the frequency of each allele in each position is observed in a single genotype experiment. The identification of haplotype frequencies from pooled data in addition to single locus analysis is of separate interest within these studies as haplotypes could increase statistical power and provide additional insight. RESULTS: We developed a method for maximum-parsimony haplotype frequency estimation from pooled DNA data based on the sparse representation of the DNA pools in a dictionary of haplotypes. Extensions to scenarios where data is noisy or even missing are also presented. The resulting method is first applied to simulated data based on the haplotypes and their associated frequencies of the AGT gene. We further evaluate our methodology on datasets consisting of SNPs from the first 7Mb of the HapMap CEU population. Noise and missing data were further introduced in the datasets in order to test the extensions of the proposed method. Both HIPPO and HAPLOPOOL were also applied to these datasets to compare performances. CONCLUSIONS: We evaluate our methodology on scenarios where pooling is more efficient relative to individual genotyping; that is, in datasets that contain pools with a small number of individuals. We show that in such scenarios our methodology outperforms state-of-the-art methods such as HIPPO and HAPLOPOOL.
format Online
Article
Text
id pubmed-3847492
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38474922013-12-07 Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA Jajamovich, Guido H Iliadis, Alexandros Anastassiou, Dimitris Wang, Xiaodong BMC Bioinformatics Research Article BACKGROUND: DNA pooling constitutes a cost effective alternative in genome wide association studies. In DNA pooling, equimolar amounts of DNA from different individuals are mixed into one sample and the frequency of each allele in each position is observed in a single genotype experiment. The identification of haplotype frequencies from pooled data in addition to single locus analysis is of separate interest within these studies as haplotypes could increase statistical power and provide additional insight. RESULTS: We developed a method for maximum-parsimony haplotype frequency estimation from pooled DNA data based on the sparse representation of the DNA pools in a dictionary of haplotypes. Extensions to scenarios where data is noisy or even missing are also presented. The resulting method is first applied to simulated data based on the haplotypes and their associated frequencies of the AGT gene. We further evaluate our methodology on datasets consisting of SNPs from the first 7Mb of the HapMap CEU population. Noise and missing data were further introduced in the datasets in order to test the extensions of the proposed method. Both HIPPO and HAPLOPOOL were also applied to these datasets to compare performances. CONCLUSIONS: We evaluate our methodology on scenarios where pooling is more efficient relative to individual genotyping; that is, in datasets that contain pools with a small number of individuals. We show that in such scenarios our methodology outperforms state-of-the-art methods such as HIPPO and HAPLOPOOL. BioMed Central 2013-09-08 /pmc/articles/PMC3847492/ /pubmed/24010487 http://dx.doi.org/10.1186/1471-2105-14-270 Text en Copyright © 2013 Jajamovich et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Jajamovich, Guido H
Iliadis, Alexandros
Anastassiou, Dimitris
Wang, Xiaodong
Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA
title Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA
title_full Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA
title_fullStr Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA
title_full_unstemmed Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA
title_short Maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled DNA
title_sort maximum-parsimony haplotype frequencies inference based on a joint constrained sparse representation of pooled dna
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3847492/
https://www.ncbi.nlm.nih.gov/pubmed/24010487
http://dx.doi.org/10.1186/1471-2105-14-270
work_keys_str_mv AT jajamovichguidoh maximumparsimonyhaplotypefrequenciesinferencebasedonajointconstrainedsparserepresentationofpooleddna
AT iliadisalexandros maximumparsimonyhaplotypefrequenciesinferencebasedonajointconstrainedsparserepresentationofpooleddna
AT anastassioudimitris maximumparsimonyhaplotypefrequenciesinferencebasedonajointconstrainedsparserepresentationofpooleddna
AT wangxiaodong maximumparsimonyhaplotypefrequenciesinferencebasedonajointconstrainedsparserepresentationofpooleddna