Cargando…

Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals

BACKGROUND: The completion of the HapMap project has stimulated further development of haplotype-based methodologies for disease associations. A key aspect of such development is the statistical inference of individual diplotypes from unphased genotypes. Several methodologies for inferring haplotype...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Shuying Sue, Cheng, Jacob Jen-Hao, Zhao, Lue Ping
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1803795/
https://www.ncbi.nlm.nih.gov/pubmed/17261196
http://dx.doi.org/10.1186/1471-2156-8-2
_version_ 1782132434336219136
author Li, Shuying Sue
Cheng, Jacob Jen-Hao
Zhao, Lue Ping
author_facet Li, Shuying Sue
Cheng, Jacob Jen-Hao
Zhao, Lue Ping
author_sort Li, Shuying Sue
collection PubMed
description BACKGROUND: The completion of the HapMap project has stimulated further development of haplotype-based methodologies for disease associations. A key aspect of such development is the statistical inference of individual diplotypes from unphased genotypes. Several methodologies for inferring haplotypes have been developed, but they have not been evaluated extensively to determine which method not only performs well, but also can be easily incorporated in downstream haplotype-based association analyses. In this paper, we attempt to do so. Our evaluation was carried out by comparing the two leading Bayesian methods, implemented in PHASE and HAPLOTYPER, and the two leading empirical methods, implemented in PL-EM and HPlus. We used these methods to analyze real data, namely the dense genotypes on X-chromosome of 30 European and 30 African trios provided by the International HapMap Project, and simulated genotype data. Our conclusions are based on these analyses. RESULTS: All programs performed very well on X-chromosome data, with an average similarity index of 0.99 and an average prediction rate of 0.99 for both European and African trios. On simulated data with approximation of coalescence, PHASE implementing the Bayesian method based on the coalescence approximation outperformed other programs on small sample sizes. When the sample size increased, other programs performed as well as PHASE. PL-EM and HPlus implementing empirical methods required much less running time than the programs implementing the Bayesian methods. They required only one hundredth or thousandth of the running time required by PHASE, particularly when analyzing large sample sizes and large umber of SNPs. CONCLUSION: For large sample sizes (hundreds or more), which most association studies require, the two empirical methods might be used since they infer the haplotypes as accurately as any Bayesian methods and can be incorporated easily into downstream haplotype-based analyses such as haplotype-association analyses.
format Text
id pubmed-1803795
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18037952007-02-23 Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals Li, Shuying Sue Cheng, Jacob Jen-Hao Zhao, Lue Ping BMC Genet Research Article BACKGROUND: The completion of the HapMap project has stimulated further development of haplotype-based methodologies for disease associations. A key aspect of such development is the statistical inference of individual diplotypes from unphased genotypes. Several methodologies for inferring haplotypes have been developed, but they have not been evaluated extensively to determine which method not only performs well, but also can be easily incorporated in downstream haplotype-based association analyses. In this paper, we attempt to do so. Our evaluation was carried out by comparing the two leading Bayesian methods, implemented in PHASE and HAPLOTYPER, and the two leading empirical methods, implemented in PL-EM and HPlus. We used these methods to analyze real data, namely the dense genotypes on X-chromosome of 30 European and 30 African trios provided by the International HapMap Project, and simulated genotype data. Our conclusions are based on these analyses. RESULTS: All programs performed very well on X-chromosome data, with an average similarity index of 0.99 and an average prediction rate of 0.99 for both European and African trios. On simulated data with approximation of coalescence, PHASE implementing the Bayesian method based on the coalescence approximation outperformed other programs on small sample sizes. When the sample size increased, other programs performed as well as PHASE. PL-EM and HPlus implementing empirical methods required much less running time than the programs implementing the Bayesian methods. They required only one hundredth or thousandth of the running time required by PHASE, particularly when analyzing large sample sizes and large umber of SNPs. CONCLUSION: For large sample sizes (hundreds or more), which most association studies require, the two empirical methods might be used since they infer the haplotypes as accurately as any Bayesian methods and can be incorporated easily into downstream haplotype-based analyses such as haplotype-association analyses. BioMed Central 2007-01-29 /pmc/articles/PMC1803795/ /pubmed/17261196 http://dx.doi.org/10.1186/1471-2156-8-2 Text en Copyright © 2007 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Li, Shuying Sue
Cheng, Jacob Jen-Hao
Zhao, Lue Ping
Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals
title Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals
title_full Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals
title_fullStr Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals
title_full_unstemmed Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals
title_short Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals
title_sort empirical vs bayesian approach for estimating haplotypes from genotypes of unrelated individuals
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1803795/
https://www.ncbi.nlm.nih.gov/pubmed/17261196
http://dx.doi.org/10.1186/1471-2156-8-2
work_keys_str_mv AT lishuyingsue empiricalvsbayesianapproachforestimatinghaplotypesfromgenotypesofunrelatedindividuals
AT chengjacobjenhao empiricalvsbayesianapproachforestimatinghaplotypesfromgenotypesofunrelatedindividuals
AT zhaolueping empiricalvsbayesianapproachforestimatinghaplotypesfromgenotypesofunrelatedindividuals