Cargando…

Approximation properties of haplotype tagging

BACKGROUND: Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. T...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vinterbo, Staal A, Dreiseitl, Stephan, Ohno-Machado, Lucila
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1395335/ https://www.ncbi.nlm.nih.gov/pubmed/16401341 http://dx.doi.org/10.1186/1471-2105-7-8

_version_	1782126949851725824
author	Vinterbo, Staal A Dreiseitl, Stephan Ohno-Machado, Lucila
author_facet	Vinterbo, Staal A Dreiseitl, Stephan Ohno-Machado, Lucila
author_sort	Vinterbo, Staal A
collection	PubMed
description	BACKGROUND: Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. This process, known as haplotype tagging, is formulated as a combinatorial optimization problem and analyzed in terms of complexity and approximation properties. RESULTS: It is shown that the tagging problem is NP-hard but approximable within 1 + ln((n(2 )- n)/2) for n haplotypes but not approximable within (1 - ε) ln(n/2) for any ε > 0 unless NP ⊂ DTIME(n(log log n)). A simple, very easily implementable algorithm that exhibits the above upper bound on solution quality is presented. This algorithm has running time O([Image: see text] (2m - p + 1)) ≤ O(m(n(2 )- n)/2) where p ≤ min(n, m) for n haplotypes of size m. As we show that the approximation bound is asymptotically tight, the algorithm presented is optimal with respect to this asymptotic bound. CONCLUSION: The haplotype tagging problem is hard, but approachable with a fast, practical, and surprisingly simple algorithm that cannot be significantly improved upon on a single processor machine. Hence, significant improvement in computatational efforts expended can only be expected if the computational effort is distributed and done in parallel.
format	Text
id	pubmed-1395335
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-13953352006-03-09 Approximation properties of haplotype tagging Vinterbo, Staal A Dreiseitl, Stephan Ohno-Machado, Lucila BMC Bioinformatics Methodology Article BACKGROUND: Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. This process, known as haplotype tagging, is formulated as a combinatorial optimization problem and analyzed in terms of complexity and approximation properties. RESULTS: It is shown that the tagging problem is NP-hard but approximable within 1 + ln((n(2 )- n)/2) for n haplotypes but not approximable within (1 - ε) ln(n/2) for any ε > 0 unless NP ⊂ DTIME(n(log log n)). A simple, very easily implementable algorithm that exhibits the above upper bound on solution quality is presented. This algorithm has running time O([Image: see text] (2m - p + 1)) ≤ O(m(n(2 )- n)/2) where p ≤ min(n, m) for n haplotypes of size m. As we show that the approximation bound is asymptotically tight, the algorithm presented is optimal with respect to this asymptotic bound. CONCLUSION: The haplotype tagging problem is hard, but approachable with a fast, practical, and surprisingly simple algorithm that cannot be significantly improved upon on a single processor machine. Hence, significant improvement in computatational efforts expended can only be expected if the computational effort is distributed and done in parallel. BioMed Central 2006-01-09 /pmc/articles/PMC1395335/ /pubmed/16401341 http://dx.doi.org/10.1186/1471-2105-7-8 Text en Copyright © 2006 Vinterbo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Vinterbo, Staal A Dreiseitl, Stephan Ohno-Machado, Lucila Approximation properties of haplotype tagging
title	Approximation properties of haplotype tagging
title_full	Approximation properties of haplotype tagging
title_fullStr	Approximation properties of haplotype tagging
title_full_unstemmed	Approximation properties of haplotype tagging
title_short	Approximation properties of haplotype tagging
title_sort	approximation properties of haplotype tagging
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1395335/ https://www.ncbi.nlm.nih.gov/pubmed/16401341 http://dx.doi.org/10.1186/1471-2105-7-8
work_keys_str_mv	AT vinterbostaala approximationpropertiesofhaplotypetagging AT dreiseitlstephan approximationpropertiesofhaplotypetagging AT ohnomachadolucila approximationpropertiesofhaplotypetagging

Approximation properties of haplotype tagging

Ejemplares similares