Cargando…

A highly accurate heuristic algorithm for the haplotype assembly problem

BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in human DNA. The sequence of SNPs in each of the two copies of a given chromosome in a diploid organism is referred to as a haplotype. Haplotype information has many applications such as gene disease di...

Descripción completa

Detalles Bibliográficos
Autores principales: Deng, Fei, Cui, Wenjuan, Wang, Lusheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582451/
https://www.ncbi.nlm.nih.gov/pubmed/23445458
http://dx.doi.org/10.1186/1471-2164-14-S2-S2
_version_ 1782260567208099840
author Deng, Fei
Cui, Wenjuan
Wang, Lusheng
author_facet Deng, Fei
Cui, Wenjuan
Wang, Lusheng
author_sort Deng, Fei
collection PubMed
description BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in human DNA. The sequence of SNPs in each of the two copies of a given chromosome in a diploid organism is referred to as a haplotype. Haplotype information has many applications such as gene disease diagnoses, drug design, etc. The haplotype assembly problem is defined as follows: Given a set of fragments sequenced from the two copies of a chromosome of a single individual, and their locations in the chromosome, which can be pre-determined by aligning the fragments to a reference DNA sequence, the goal here is to reconstruct two haplotypes (h(1), h(2)) from the input fragments. Existing algorithms do not work well when the error rate of fragments is high. Here we design an algorithm that can give accurate solutions, even if the error rate of fragments is high. RESULTS: We first give a dynamic programming algorithm that can give exact solutions to the haplotype assembly problem. The time complexity of the algorithm is O(n × 2(t )× t), where n is the number of SNPs, and t is the maximum coverage of a SNP site. The algorithm is slow when t is large. To solve the problem when t is large, we further propose a heuristic algorithm on the basis of the dynamic programming algorithm. Experiments show that our heuristic algorithm can give very accurate solutions. CONCLUSIONS: We have tested our algorithm on a set of benchmark datasets. Experiments show that our algorithm can give very accurate solutions. It outperforms most of the existing programs when the error rate of the input fragments is high.
format Online
Article
Text
id pubmed-3582451
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35824512013-03-05 A highly accurate heuristic algorithm for the haplotype assembly problem Deng, Fei Cui, Wenjuan Wang, Lusheng BMC Genomics Research BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in human DNA. The sequence of SNPs in each of the two copies of a given chromosome in a diploid organism is referred to as a haplotype. Haplotype information has many applications such as gene disease diagnoses, drug design, etc. The haplotype assembly problem is defined as follows: Given a set of fragments sequenced from the two copies of a chromosome of a single individual, and their locations in the chromosome, which can be pre-determined by aligning the fragments to a reference DNA sequence, the goal here is to reconstruct two haplotypes (h(1), h(2)) from the input fragments. Existing algorithms do not work well when the error rate of fragments is high. Here we design an algorithm that can give accurate solutions, even if the error rate of fragments is high. RESULTS: We first give a dynamic programming algorithm that can give exact solutions to the haplotype assembly problem. The time complexity of the algorithm is O(n × 2(t )× t), where n is the number of SNPs, and t is the maximum coverage of a SNP site. The algorithm is slow when t is large. To solve the problem when t is large, we further propose a heuristic algorithm on the basis of the dynamic programming algorithm. Experiments show that our heuristic algorithm can give very accurate solutions. CONCLUSIONS: We have tested our algorithm on a set of benchmark datasets. Experiments show that our algorithm can give very accurate solutions. It outperforms most of the existing programs when the error rate of the input fragments is high. BioMed Central 2013-02-15 /pmc/articles/PMC3582451/ /pubmed/23445458 http://dx.doi.org/10.1186/1471-2164-14-S2-S2 Text en Copyright ©2013 Deng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Deng, Fei
Cui, Wenjuan
Wang, Lusheng
A highly accurate heuristic algorithm for the haplotype assembly problem
title A highly accurate heuristic algorithm for the haplotype assembly problem
title_full A highly accurate heuristic algorithm for the haplotype assembly problem
title_fullStr A highly accurate heuristic algorithm for the haplotype assembly problem
title_full_unstemmed A highly accurate heuristic algorithm for the haplotype assembly problem
title_short A highly accurate heuristic algorithm for the haplotype assembly problem
title_sort highly accurate heuristic algorithm for the haplotype assembly problem
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3582451/
https://www.ncbi.nlm.nih.gov/pubmed/23445458
http://dx.doi.org/10.1186/1471-2164-14-S2-S2
work_keys_str_mv AT dengfei ahighlyaccurateheuristicalgorithmforthehaplotypeassemblyproblem
AT cuiwenjuan ahighlyaccurateheuristicalgorithmforthehaplotypeassemblyproblem
AT wanglusheng ahighlyaccurateheuristicalgorithmforthehaplotypeassemblyproblem
AT dengfei highlyaccurateheuristicalgorithmforthehaplotypeassemblyproblem
AT cuiwenjuan highlyaccurateheuristicalgorithmforthehaplotypeassemblyproblem
AT wanglusheng highlyaccurateheuristicalgorithmforthehaplotypeassemblyproblem