Cargando…

smsMap: mapping single molecule sequencing reads by locating the alignment starting positions

BACKGROUND: Single Molecule Sequencing (SMS) technology can produce longer reads with higher sequencing error rate. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. Most existing mapping tools generally adopt the traditiona...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Ze-Gang, Zhang, Shao-Wu, Liu, Fei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7430848/
https://www.ncbi.nlm.nih.gov/pubmed/32753028
http://dx.doi.org/10.1186/s12859-020-03698-w
_version_ 1783571493111726080
author Wei, Ze-Gang
Zhang, Shao-Wu
Liu, Fei
author_facet Wei, Ze-Gang
Zhang, Shao-Wu
Liu, Fei
author_sort Wei, Ze-Gang
collection PubMed
description BACKGROUND: Single Molecule Sequencing (SMS) technology can produce longer reads with higher sequencing error rate. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. Most existing mapping tools generally adopt the traditional seed-and-extend strategy, and the candidate aligned regions for each query read are selected either by counting the number of matched seeds or chaining a group of seeds. However, for all the existing mapping tools, the coverage ratio of the alignment region to the query read is lower, and the read alignment quality and efficiency need to be improved. Here, we introduce smsMap, a novel mapping tool that is specifically designed to map the long reads of SMS to a reference genome. RESULTS: smsMap was evaluated with other existing seven SMS mapping tools (e.g., BLASR, minimap2, and BWA-MEM) on both simulated and real-life SMS datasets. The experimental results show that smsMap can efficiently achieve higher aligned read coverage ratio and has higher sensitivity that can align more sequences and bases to the reference genome. Additionally, smsMap is more robust to sequencing errors. CONCLUSIONS: smsMap is computationally efficient to align SMS reads, especially for the larger size of the reference genome (e.g., H. sapiens genome with over 3 billion base pairs). The source code of smsMap can be freely downloaded from https://github.com/NWPU-903PR/smsMap.
format Online
Article
Text
id pubmed-7430848
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74308482020-08-18 smsMap: mapping single molecule sequencing reads by locating the alignment starting positions Wei, Ze-Gang Zhang, Shao-Wu Liu, Fei BMC Bioinformatics Methodology Article BACKGROUND: Single Molecule Sequencing (SMS) technology can produce longer reads with higher sequencing error rate. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. Most existing mapping tools generally adopt the traditional seed-and-extend strategy, and the candidate aligned regions for each query read are selected either by counting the number of matched seeds or chaining a group of seeds. However, for all the existing mapping tools, the coverage ratio of the alignment region to the query read is lower, and the read alignment quality and efficiency need to be improved. Here, we introduce smsMap, a novel mapping tool that is specifically designed to map the long reads of SMS to a reference genome. RESULTS: smsMap was evaluated with other existing seven SMS mapping tools (e.g., BLASR, minimap2, and BWA-MEM) on both simulated and real-life SMS datasets. The experimental results show that smsMap can efficiently achieve higher aligned read coverage ratio and has higher sensitivity that can align more sequences and bases to the reference genome. Additionally, smsMap is more robust to sequencing errors. CONCLUSIONS: smsMap is computationally efficient to align SMS reads, especially for the larger size of the reference genome (e.g., H. sapiens genome with over 3 billion base pairs). The source code of smsMap can be freely downloaded from https://github.com/NWPU-903PR/smsMap. BioMed Central 2020-08-04 /pmc/articles/PMC7430848/ /pubmed/32753028 http://dx.doi.org/10.1186/s12859-020-03698-w Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Wei, Ze-Gang
Zhang, Shao-Wu
Liu, Fei
smsMap: mapping single molecule sequencing reads by locating the alignment starting positions
title smsMap: mapping single molecule sequencing reads by locating the alignment starting positions
title_full smsMap: mapping single molecule sequencing reads by locating the alignment starting positions
title_fullStr smsMap: mapping single molecule sequencing reads by locating the alignment starting positions
title_full_unstemmed smsMap: mapping single molecule sequencing reads by locating the alignment starting positions
title_short smsMap: mapping single molecule sequencing reads by locating the alignment starting positions
title_sort smsmap: mapping single molecule sequencing reads by locating the alignment starting positions
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7430848/
https://www.ncbi.nlm.nih.gov/pubmed/32753028
http://dx.doi.org/10.1186/s12859-020-03698-w
work_keys_str_mv AT weizegang smsmapmappingsinglemoleculesequencingreadsbylocatingthealignmentstartingpositions
AT zhangshaowu smsmapmappingsinglemoleculesequencingreadsbylocatingthealignmentstartingpositions
AT liufei smsmapmappingsinglemoleculesequencingreadsbylocatingthealignmentstartingpositions