Cargando…

MixSIH: a mixture model for single individual haplotyping

BACKGROUND: Haplotype information is useful for various genetic analyses, including genome-wide association studies. Determining haplotypes experimentally is difficult and there are several computational approaches that infer haplotypes from genomic data. Among such approaches, single individual hap...

Descripción completa

Detalles Bibliográficos
Autores principales: Matsumoto, Hirotaka, Kiryu, Hisanori
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582441/
https://www.ncbi.nlm.nih.gov/pubmed/23445519
http://dx.doi.org/10.1186/1471-2164-14-S2-S5
_version_ 1782260563057836032
author Matsumoto, Hirotaka
Kiryu, Hisanori
author_facet Matsumoto, Hirotaka
Kiryu, Hisanori
author_sort Matsumoto, Hirotaka
collection PubMed
description BACKGROUND: Haplotype information is useful for various genetic analyses, including genome-wide association studies. Determining haplotypes experimentally is difficult and there are several computational approaches that infer haplotypes from genomic data. Among such approaches, single individual haplotyping or haplotype assembly, which infers two haplotypes of an individual from aligned sequence fragments, has been attracting considerable attention. To avoid incorrect results in downstream analyses, it is important not only to assemble haplotypes as long as possible but also to provide means to extract highly reliable haplotype regions. Although there are several efficient algorithms for solving haplotype assembly, there are no efficient method that allow for extracting the regions assembled with high confidence. RESULTS: We develop a probabilistic model, called MixSIH, for solving the haplotype assembly problem. The model has two mixture components representing two haplotypes. Based on the optimized model, a quality score is defined, which we call the 'minimum connectivity' (MC) score, for each segment in the haplotype assembly. Because existing accuracy measures for haplotype assembly are designed to compare the efficiency between the algorithms and are not suitable for evaluating the quality of the set of partially assembled haplotype segments, we develop an accuracy measure based on the pairwise consistency and evaluate the accuracy on the simulation and real data. By using the MC scores, our algorithm can extract highly accurate haplotype segments. We also show evidence that an existing experimental dataset contains chimeric read fragments derived from different haplotypes, which significantly degrade the quality of assembled haplotypes. CONCLUSIONS: We develop a novel method for solving the haplotype assembly problem. We also define the quality score which is based on our model and indicates the accuracy of the haplotypes segments. In our evaluation, MixSIH has successfully extracted reliable haplotype segments. The C++ source code of MixSIH is available at https://sites.google.com/site/hmatsu1226/software/mixsih.
format Online
Article
Text
id pubmed-3582441
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35824412013-03-05 MixSIH: a mixture model for single individual haplotyping Matsumoto, Hirotaka Kiryu, Hisanori BMC Genomics Research BACKGROUND: Haplotype information is useful for various genetic analyses, including genome-wide association studies. Determining haplotypes experimentally is difficult and there are several computational approaches that infer haplotypes from genomic data. Among such approaches, single individual haplotyping or haplotype assembly, which infers two haplotypes of an individual from aligned sequence fragments, has been attracting considerable attention. To avoid incorrect results in downstream analyses, it is important not only to assemble haplotypes as long as possible but also to provide means to extract highly reliable haplotype regions. Although there are several efficient algorithms for solving haplotype assembly, there are no efficient method that allow for extracting the regions assembled with high confidence. RESULTS: We develop a probabilistic model, called MixSIH, for solving the haplotype assembly problem. The model has two mixture components representing two haplotypes. Based on the optimized model, a quality score is defined, which we call the 'minimum connectivity' (MC) score, for each segment in the haplotype assembly. Because existing accuracy measures for haplotype assembly are designed to compare the efficiency between the algorithms and are not suitable for evaluating the quality of the set of partially assembled haplotype segments, we develop an accuracy measure based on the pairwise consistency and evaluate the accuracy on the simulation and real data. By using the MC scores, our algorithm can extract highly accurate haplotype segments. We also show evidence that an existing experimental dataset contains chimeric read fragments derived from different haplotypes, which significantly degrade the quality of assembled haplotypes. CONCLUSIONS: We develop a novel method for solving the haplotype assembly problem. We also define the quality score which is based on our model and indicates the accuracy of the haplotypes segments. In our evaluation, MixSIH has successfully extracted reliable haplotype segments. The C++ source code of MixSIH is available at https://sites.google.com/site/hmatsu1226/software/mixsih. BioMed Central 2013-02-15 /pmc/articles/PMC3582441/ /pubmed/23445519 http://dx.doi.org/10.1186/1471-2164-14-S2-S5 Text en Copyright ©2013 Matsumoto and Kiryu; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Matsumoto, Hirotaka
Kiryu, Hisanori
MixSIH: a mixture model for single individual haplotyping
title MixSIH: a mixture model for single individual haplotyping
title_full MixSIH: a mixture model for single individual haplotyping
title_fullStr MixSIH: a mixture model for single individual haplotyping
title_full_unstemmed MixSIH: a mixture model for single individual haplotyping
title_short MixSIH: a mixture model for single individual haplotyping
title_sort mixsih: a mixture model for single individual haplotyping
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582441/
https://www.ncbi.nlm.nih.gov/pubmed/23445519
http://dx.doi.org/10.1186/1471-2164-14-S2-S5
work_keys_str_mv AT matsumotohirotaka mixsihamixturemodelforsingleindividualhaplotyping
AT kiryuhisanori mixsihamixturemodelforsingleindividualhaplotyping