Cargando…
Fast and efficient short read mapping based on a succinct hash index
BACKGROUND: Various indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time. RESULTS: We present the succinct hash index – a novel data struc...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5845352/ https://www.ncbi.nlm.nih.gov/pubmed/29523083 http://dx.doi.org/10.1186/s12859-018-2094-5 |
_version_ | 1783305412620058624 |
---|---|
author | Zhang, Haowen Chan, Yuandong Fan, Kaichao Schmidt, Bertil Liu, Weiguo |
author_facet | Zhang, Haowen Chan, Yuandong Fan, Kaichao Schmidt, Bertil Liu, Weiguo |
author_sort | Zhang, Haowen |
collection | PubMed |
description | BACKGROUND: Various indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time. RESULTS: We present the succinct hash index – a novel data structure for read mapping which is a variant of the classical q-gram index with a particularly small memory footprint occupying between 3.5 and 5.3 GB for a human reference genome for typical parameter settings. The succinct hash index features two novel seed selection algorithms (group seeding and variable-length seeding) and an efficient parallel construction algorithm, which we have implemented to design the FEM (Fast(F) and Efficient(E) read Mapper(M)) mapper. FEM can return all read mappings within a given edit distance. Our experimental results show that FEM is scalable and outperforms other state-of-the-art all-mappers in terms of both speed and memory footprint. Compared to Masai, FEM is an order-of-magnitude faster using a single thread and two orders-of-magnitude faster when using multiple threads. Furthermore, we observe an up to 2.8-fold speedup compared to BitMapper and an order-of-magnitude speedup compared to BitMapper2 and Hobbes3. CONCLUSIONS: The presented succinct index is the first feasible implementation of the q-gram index functionality that occupies around 3.5 GB of memory for a whole human reference genome. FEM is freely available at https://github.com/haowenz/FEM. |
format | Online Article Text |
id | pubmed-5845352 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-58453522018-03-19 Fast and efficient short read mapping based on a succinct hash index Zhang, Haowen Chan, Yuandong Fan, Kaichao Schmidt, Bertil Liu, Weiguo BMC Bioinformatics Research Article BACKGROUND: Various indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time. RESULTS: We present the succinct hash index – a novel data structure for read mapping which is a variant of the classical q-gram index with a particularly small memory footprint occupying between 3.5 and 5.3 GB for a human reference genome for typical parameter settings. The succinct hash index features two novel seed selection algorithms (group seeding and variable-length seeding) and an efficient parallel construction algorithm, which we have implemented to design the FEM (Fast(F) and Efficient(E) read Mapper(M)) mapper. FEM can return all read mappings within a given edit distance. Our experimental results show that FEM is scalable and outperforms other state-of-the-art all-mappers in terms of both speed and memory footprint. Compared to Masai, FEM is an order-of-magnitude faster using a single thread and two orders-of-magnitude faster when using multiple threads. Furthermore, we observe an up to 2.8-fold speedup compared to BitMapper and an order-of-magnitude speedup compared to BitMapper2 and Hobbes3. CONCLUSIONS: The presented succinct index is the first feasible implementation of the q-gram index functionality that occupies around 3.5 GB of memory for a whole human reference genome. FEM is freely available at https://github.com/haowenz/FEM. BioMed Central 2018-03-09 /pmc/articles/PMC5845352/ /pubmed/29523083 http://dx.doi.org/10.1186/s12859-018-2094-5 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Zhang, Haowen Chan, Yuandong Fan, Kaichao Schmidt, Bertil Liu, Weiguo Fast and efficient short read mapping based on a succinct hash index |
title | Fast and efficient short read mapping based on a succinct hash index |
title_full | Fast and efficient short read mapping based on a succinct hash index |
title_fullStr | Fast and efficient short read mapping based on a succinct hash index |
title_full_unstemmed | Fast and efficient short read mapping based on a succinct hash index |
title_short | Fast and efficient short read mapping based on a succinct hash index |
title_sort | fast and efficient short read mapping based on a succinct hash index |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5845352/ https://www.ncbi.nlm.nih.gov/pubmed/29523083 http://dx.doi.org/10.1186/s12859-018-2094-5 |
work_keys_str_mv | AT zhanghaowen fastandefficientshortreadmappingbasedonasuccincthashindex AT chanyuandong fastandefficientshortreadmappingbasedonasuccincthashindex AT fankaichao fastandefficientshortreadmappingbasedonasuccincthashindex AT schmidtbertil fastandefficientshortreadmappingbasedonasuccincthashindex AT liuweiguo fastandefficientshortreadmappingbasedonasuccincthashindex |