Cargando…

Haplotype inference based on Hidden Markov Models in the QTL-MAS 2010 multi-generational dataset

BACKGROUND: We have previously demonstrated an approach for efficient computation of genotype probabilities, and more generally probabilities of allele inheritance in inbred as well as outbred populations. That work also included an extension for haplotype inference, or phasing, using Hidden Markov...

Descripción completa

Detalles Bibliográficos
Autor principal: Nettelblad, Carl
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3103195/
https://www.ncbi.nlm.nih.gov/pubmed/21624166
http://dx.doi.org/10.1186/1753-6561-5-S3-S10
_version_ 1782204496232841216
author Nettelblad, Carl
author_facet Nettelblad, Carl
author_sort Nettelblad, Carl
collection PubMed
description BACKGROUND: We have previously demonstrated an approach for efficient computation of genotype probabilities, and more generally probabilities of allele inheritance in inbred as well as outbred populations. That work also included an extension for haplotype inference, or phasing, using Hidden Markov Models. Computational phasing of multi-thousand marker datasets has not become common as of yet. In this communication, we further investigate the method presented earlier for such problems, in a multi-generational dataset simulated for QTL detection. RESULTS: When analyzing the dataset simulated for the 14th QTLMAS workshop, the phasing produced showed zero deviations compared to original simulated phase in the founder generation. In total, 99.93% of all markers were correctly phased. 97.68% of the individuals were correct in all markers over all 5 simulated chromosomes. Results were produced over a weekend on a small computational cluster. The specific algorithmic adaptations needed for the Markov model training approach in order to reach convergence are described. CONCLUSIONS: Our method provides efficient, near-perfect haplotype inference allowing the determination of completely phased genomes in dense pedigrees. These developments are of special value for applications where marker alleles are not corresponding directly to QTL alleles, thus necessitating tracking of allele origin, and in complex multi-generational crosses. The cnF2freq codebase, which is in a current state of active development, is available under a BSD-style license.
format Text
id pubmed-3103195
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31031952011-05-28 Haplotype inference based on Hidden Markov Models in the QTL-MAS 2010 multi-generational dataset Nettelblad, Carl BMC Proc Proceedings BACKGROUND: We have previously demonstrated an approach for efficient computation of genotype probabilities, and more generally probabilities of allele inheritance in inbred as well as outbred populations. That work also included an extension for haplotype inference, or phasing, using Hidden Markov Models. Computational phasing of multi-thousand marker datasets has not become common as of yet. In this communication, we further investigate the method presented earlier for such problems, in a multi-generational dataset simulated for QTL detection. RESULTS: When analyzing the dataset simulated for the 14th QTLMAS workshop, the phasing produced showed zero deviations compared to original simulated phase in the founder generation. In total, 99.93% of all markers were correctly phased. 97.68% of the individuals were correct in all markers over all 5 simulated chromosomes. Results were produced over a weekend on a small computational cluster. The specific algorithmic adaptations needed for the Markov model training approach in order to reach convergence are described. CONCLUSIONS: Our method provides efficient, near-perfect haplotype inference allowing the determination of completely phased genomes in dense pedigrees. These developments are of special value for applications where marker alleles are not corresponding directly to QTL alleles, thus necessitating tracking of allele origin, and in complex multi-generational crosses. The cnF2freq codebase, which is in a current state of active development, is available under a BSD-style license. BioMed Central 2011-05-27 /pmc/articles/PMC3103195/ /pubmed/21624166 http://dx.doi.org/10.1186/1753-6561-5-S3-S10 Text en Copyright ©2011 Nettelblad; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Nettelblad, Carl
Haplotype inference based on Hidden Markov Models in the QTL-MAS 2010 multi-generational dataset
title Haplotype inference based on Hidden Markov Models in the QTL-MAS 2010 multi-generational dataset
title_full Haplotype inference based on Hidden Markov Models in the QTL-MAS 2010 multi-generational dataset
title_fullStr Haplotype inference based on Hidden Markov Models in the QTL-MAS 2010 multi-generational dataset
title_full_unstemmed Haplotype inference based on Hidden Markov Models in the QTL-MAS 2010 multi-generational dataset
title_short Haplotype inference based on Hidden Markov Models in the QTL-MAS 2010 multi-generational dataset
title_sort haplotype inference based on hidden markov models in the qtl-mas 2010 multi-generational dataset
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3103195/
https://www.ncbi.nlm.nih.gov/pubmed/21624166
http://dx.doi.org/10.1186/1753-6561-5-S3-S10
work_keys_str_mv AT nettelbladcarl haplotypeinferencebasedonhiddenmarkovmodelsintheqtlmas2010multigenerationaldataset