Cargando…

HIA: a genome mapper using hybrid index-based sequence alignment

BACKGROUND: A number of alignment tools have been developed to align sequencing reads to the human reference genome. The scale of information from next-generation sequencing (NGS) experiments, however, is increasing rapidly. Recent studies based on NGS technology have routinely produced exome or who...

Descripción completa

Detalles Bibliográficos
Autores principales:	Choi, Jongpill, Park, Kiejung, Cho, Seong Beom, Chung, Myungguen
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Software Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4688996/ https://www.ncbi.nlm.nih.gov/pubmed/26702294 http://dx.doi.org/10.1186/s13015-015-0062-4

_version_	1782406774543876096
author	Choi, Jongpill Park, Kiejung Cho, Seong Beom Chung, Myungguen
author_facet	Choi, Jongpill Park, Kiejung Cho, Seong Beom Chung, Myungguen
author_sort	Choi, Jongpill
collection	PubMed
description	BACKGROUND: A number of alignment tools have been developed to align sequencing reads to the human reference genome. The scale of information from next-generation sequencing (NGS) experiments, however, is increasing rapidly. Recent studies based on NGS technology have routinely produced exome or whole-genome sequences from several hundreds or thousands of samples. To accommodate the increasing need of analyzing very large NGS data sets, it is necessary to develop faster, more sensitive and accurate mapping tools. RESULTS: HIA uses two indices, a hash table index and a suffix array index. The hash table performs direct lookup of a q-gram, and the suffix array performs very fast lookup of variable-length strings by exploiting binary search. We observed that combining hash table and suffix array (hybrid index) is much faster than the suffix array method for finding a substring in the reference sequence. Here, we defined the matching region (MR) is a longest common substring between a reference and a read. And, we also defined the candidate alignment regions (CARs) as a list of MRs that is close to each other. The hybrid index is used to find candidate alignment regions (CARs) between a reference and a read. We found that aligning only the unmatched regions in the CAR is much faster than aligning the whole CAR. In benchmark analysis, HIA outperformed in mapping speed compared with the other aligners, without significant loss of mapping accuracy. CONCLUSIONS: Our experiments show that the hybrid of hash table and suffix array is useful in terms of speed for mapping NGS sequencing reads to the human reference genome sequence. In conclusion, our tool is appropriate for aligning massive data sets generated by NGS sequencing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-015-0062-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4688996
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-46889962015-12-24 HIA: a genome mapper using hybrid index-based sequence alignment Choi, Jongpill Park, Kiejung Cho, Seong Beom Chung, Myungguen Algorithms Mol Biol Software Article BACKGROUND: A number of alignment tools have been developed to align sequencing reads to the human reference genome. The scale of information from next-generation sequencing (NGS) experiments, however, is increasing rapidly. Recent studies based on NGS technology have routinely produced exome or whole-genome sequences from several hundreds or thousands of samples. To accommodate the increasing need of analyzing very large NGS data sets, it is necessary to develop faster, more sensitive and accurate mapping tools. RESULTS: HIA uses two indices, a hash table index and a suffix array index. The hash table performs direct lookup of a q-gram, and the suffix array performs very fast lookup of variable-length strings by exploiting binary search. We observed that combining hash table and suffix array (hybrid index) is much faster than the suffix array method for finding a substring in the reference sequence. Here, we defined the matching region (MR) is a longest common substring between a reference and a read. And, we also defined the candidate alignment regions (CARs) as a list of MRs that is close to each other. The hybrid index is used to find candidate alignment regions (CARs) between a reference and a read. We found that aligning only the unmatched regions in the CAR is much faster than aligning the whole CAR. In benchmark analysis, HIA outperformed in mapping speed compared with the other aligners, without significant loss of mapping accuracy. CONCLUSIONS: Our experiments show that the hybrid of hash table and suffix array is useful in terms of speed for mapping NGS sequencing reads to the human reference genome sequence. In conclusion, our tool is appropriate for aligning massive data sets generated by NGS sequencing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-015-0062-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-12-23 /pmc/articles/PMC4688996/ /pubmed/26702294 http://dx.doi.org/10.1186/s13015-015-0062-4 Text en © Choi et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Article Choi, Jongpill Park, Kiejung Cho, Seong Beom Chung, Myungguen HIA: a genome mapper using hybrid index-based sequence alignment
title	HIA: a genome mapper using hybrid index-based sequence alignment
title_full	HIA: a genome mapper using hybrid index-based sequence alignment
title_fullStr	HIA: a genome mapper using hybrid index-based sequence alignment
title_full_unstemmed	HIA: a genome mapper using hybrid index-based sequence alignment
title_short	HIA: a genome mapper using hybrid index-based sequence alignment
title_sort	hia: a genome mapper using hybrid index-based sequence alignment
topic	Software Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4688996/ https://www.ncbi.nlm.nih.gov/pubmed/26702294 http://dx.doi.org/10.1186/s13015-015-0062-4
work_keys_str_mv	AT choijongpill hiaagenomemapperusinghybridindexbasedsequencealignment AT parkkiejung hiaagenomemapperusinghybridindexbasedsequencealignment AT choseongbeom hiaagenomemapperusinghybridindexbasedsequencealignment AT chungmyungguen hiaagenomemapperusinghybridindexbasedsequencealignment

HIA: a genome mapper using hybrid index-based sequence alignment

Ejemplares similares