Cargando…

A fast algorithm for genome-wide haplotype pattern mining

BACKGROUND: Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individ...

Descripción completa

Detalles Bibliográficos
Autores principales:	Besenbacher, Søren, Pedersen, Christian NS, Mailund, Thomas
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648728/ https://www.ncbi.nlm.nih.gov/pubmed/19208179 http://dx.doi.org/10.1186/1471-2105-10-S1-S74

_version_	1782164974281424896
author	Besenbacher, Søren Pedersen, Christian NS Mailund, Thomas
author_facet	Besenbacher, Søren Pedersen, Christian NS Mailund, Thomas
author_sort	Besenbacher, Søren
collection	PubMed
description	BACKGROUND: Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individuals, and analyse such data for markers associated with a diseases. The statistical power to detect association, however, is limited when markers are analysed one at a time. This can be alleviated by considering multiple markers simultaneously. The Haplotype Pattern Mining (HPM) method is a machine learning approach to do exactly this. RESULTS: We present a new, faster algorithm for the HPM method. The new approach use patterns of haplotype diversity in the genome: locally in the genome, the number of observed haplotypes is much smaller than the total number of possible haplotypes. We show that the new approach speeds up the HPM method with a factor of 2 on a genome-wide dataset with 5009 individuals typed in 491208 markers using default parameters and more if the pattern length is increased. CONCLUSION: The new algorithm speeds up the HPM method and we show that it is feasible to apply HPM to whole genome association mapping with thousands of individuals and hundreds of thousands of markers.
format	Text
id	pubmed-2648728
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26487282009-02-28 A fast algorithm for genome-wide haplotype pattern mining Besenbacher, Søren Pedersen, Christian NS Mailund, Thomas BMC Bioinformatics Research BACKGROUND: Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individuals, and analyse such data for markers associated with a diseases. The statistical power to detect association, however, is limited when markers are analysed one at a time. This can be alleviated by considering multiple markers simultaneously. The Haplotype Pattern Mining (HPM) method is a machine learning approach to do exactly this. RESULTS: We present a new, faster algorithm for the HPM method. The new approach use patterns of haplotype diversity in the genome: locally in the genome, the number of observed haplotypes is much smaller than the total number of possible haplotypes. We show that the new approach speeds up the HPM method with a factor of 2 on a genome-wide dataset with 5009 individuals typed in 491208 markers using default parameters and more if the pattern length is increased. CONCLUSION: The new algorithm speeds up the HPM method and we show that it is feasible to apply HPM to whole genome association mapping with thousands of individuals and hundreds of thousands of markers. BioMed Central 2009-01-30 /pmc/articles/PMC2648728/ /pubmed/19208179 http://dx.doi.org/10.1186/1471-2105-10-S1-S74 Text en Copyright © 2009 Besenbacher et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Besenbacher, Søren Pedersen, Christian NS Mailund, Thomas A fast algorithm for genome-wide haplotype pattern mining
title	A fast algorithm for genome-wide haplotype pattern mining
title_full	A fast algorithm for genome-wide haplotype pattern mining
title_fullStr	A fast algorithm for genome-wide haplotype pattern mining
title_full_unstemmed	A fast algorithm for genome-wide haplotype pattern mining
title_short	A fast algorithm for genome-wide haplotype pattern mining
title_sort	fast algorithm for genome-wide haplotype pattern mining
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648728/ https://www.ncbi.nlm.nih.gov/pubmed/19208179 http://dx.doi.org/10.1186/1471-2105-10-S1-S74
work_keys_str_mv	AT besenbachersøren afastalgorithmforgenomewidehaplotypepatternmining AT pedersenchristianns afastalgorithmforgenomewidehaplotypepatternmining AT mailundthomas afastalgorithmforgenomewidehaplotypepatternmining AT besenbachersøren fastalgorithmforgenomewidehaplotypepatternmining AT pedersenchristianns fastalgorithmforgenomewidehaplotypepatternmining AT mailundthomas fastalgorithmforgenomewidehaplotypepatternmining

A fast algorithm for genome-wide haplotype pattern mining

Ejemplares similares