Cargando…

Perfect Hamming code with a hash table for faster genome mapping

BACKGROUND: With the advent of next-generation sequencers, the growing demands to map short DNA sequences to a genome have promoted the development of fast algorithms and tools. The tools commonly used today are based on either a hash table or the suffix array/Burrow–Wheeler transform. These algorit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Takenaka, Yoichi, Seno, Shigeto, Matsuda, Hideo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3333191/ https://www.ncbi.nlm.nih.gov/pubmed/22369457 http://dx.doi.org/10.1186/1471-2164-12-S3-S8

_version_	1782230393867468800
author	Takenaka, Yoichi Seno, Shigeto Matsuda, Hideo
author_facet	Takenaka, Yoichi Seno, Shigeto Matsuda, Hideo
author_sort	Takenaka, Yoichi
collection	PubMed
description	BACKGROUND: With the advent of next-generation sequencers, the growing demands to map short DNA sequences to a genome have promoted the development of fast algorithms and tools. The tools commonly used today are based on either a hash table or the suffix array/Burrow–Wheeler transform. These algorithms are the best suited to finding the genome position of exactly matching short reads. However, they have limited capacity to handle the mismatches. To find n-mismatches, they requires O(2(n)) times the computation time of exact matches. Therefore, acceleration techniques are required. RESULTS: We propose a hash-based method for genome mapping that reduces the number of hash references for finding mismatches without increasing the size of the hash table. The method regards DNA subsequences as words on Galois extension field GF(2(2)) and each word is encoded to a code word of a perfect Hamming code. The perfect Hamming code defines equivalence classes of DNA subsequences. Each equivalence class includes subsequence whose corresponding words on GF(2(2)) are encoded to a corresponding code word. The code word is used as a hash key to store these subsequences in a hash table. Specifically, it reduces by about 70% the number of hash keys necessary for searching the genome positions of all 2-mismatches of 21-base-long DNA subsequence. CONCLUSIONS: The paper shows perfect hamming code can reduce the number of hash references for hash-based genome mapping. As the computation time to calculate code words is far shorter than a hash reference, our method is effective to reduce the computation time to map short DNA sequences to genome. The amount of data that DNA sequencers generate continues to increase and more accurate genome mappings are required. Thus our method will be a key technology to develop faster genome mapping software.
format	Online Article Text
id	pubmed-3333191
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33331912012-04-24 Perfect Hamming code with a hash table for faster genome mapping Takenaka, Yoichi Seno, Shigeto Matsuda, Hideo BMC Genomics Proceedings BACKGROUND: With the advent of next-generation sequencers, the growing demands to map short DNA sequences to a genome have promoted the development of fast algorithms and tools. The tools commonly used today are based on either a hash table or the suffix array/Burrow–Wheeler transform. These algorithms are the best suited to finding the genome position of exactly matching short reads. However, they have limited capacity to handle the mismatches. To find n-mismatches, they requires O(2(n)) times the computation time of exact matches. Therefore, acceleration techniques are required. RESULTS: We propose a hash-based method for genome mapping that reduces the number of hash references for finding mismatches without increasing the size of the hash table. The method regards DNA subsequences as words on Galois extension field GF(2(2)) and each word is encoded to a code word of a perfect Hamming code. The perfect Hamming code defines equivalence classes of DNA subsequences. Each equivalence class includes subsequence whose corresponding words on GF(2(2)) are encoded to a corresponding code word. The code word is used as a hash key to store these subsequences in a hash table. Specifically, it reduces by about 70% the number of hash keys necessary for searching the genome positions of all 2-mismatches of 21-base-long DNA subsequence. CONCLUSIONS: The paper shows perfect hamming code can reduce the number of hash references for hash-based genome mapping. As the computation time to calculate code words is far shorter than a hash reference, our method is effective to reduce the computation time to map short DNA sequences to genome. The amount of data that DNA sequencers generate continues to increase and more accurate genome mappings are required. Thus our method will be a key technology to develop faster genome mapping software. BioMed Central 2011-11-30 /pmc/articles/PMC3333191/ /pubmed/22369457 http://dx.doi.org/10.1186/1471-2164-12-S3-S8 Text en Copyright ©2011 Takenaka et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Takenaka, Yoichi Seno, Shigeto Matsuda, Hideo Perfect Hamming code with a hash table for faster genome mapping
title	Perfect Hamming code with a hash table for faster genome mapping
title_full	Perfect Hamming code with a hash table for faster genome mapping
title_fullStr	Perfect Hamming code with a hash table for faster genome mapping
title_full_unstemmed	Perfect Hamming code with a hash table for faster genome mapping
title_short	Perfect Hamming code with a hash table for faster genome mapping
title_sort	perfect hamming code with a hash table for faster genome mapping
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3333191/ https://www.ncbi.nlm.nih.gov/pubmed/22369457 http://dx.doi.org/10.1186/1471-2164-12-S3-S8
work_keys_str_mv	AT takenakayoichi perfecthammingcodewithahashtableforfastergenomemapping AT senoshigeto perfecthammingcodewithahashtableforfastergenomemapping AT matsudahideo perfecthammingcodewithahashtableforfastergenomemapping

Perfect Hamming code with a hash table for faster genome mapping

Ejemplares similares