Cargando…

kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph

With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a refe...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Ze-Gang, Fan, Xing-Guo, Zhang, Hao, Zhang, Xiao-Dan, Liu, Fei, Qian, Yu, Zhang, Shao-Wu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9117619/
https://www.ncbi.nlm.nih.gov/pubmed/35601495
http://dx.doi.org/10.3389/fgene.2022.890651
_version_ 1784710348248449024
author Wei, Ze-Gang
Fan, Xing-Guo
Zhang, Hao
Zhang, Xiao-Dan
Liu, Fei
Qian, Yu
Zhang, Shao-Wu
author_facet Wei, Ze-Gang
Fan, Xing-Guo
Zhang, Hao
Zhang, Xiao-Dan
Liu, Fei
Qian, Yu
Zhang, Shao-Wu
author_sort Wei, Ze-Gang
collection PubMed
description With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. However, these long reads contain higher sequencing errors and could more frequently span the breakpoints of structural variants (SVs) than those of shorter reads, leading to many unaligned reads or reads that are partially aligned for most state-of-the-art mappers. As a result, these methods usually focus on producing local mapping results for the query read rather than obtaining the whole end-to-end alignment. We introduce kngMap, a novel k-mer neighborhood graph-based mapper that is specifically designed to align long noisy SMS reads to a reference sequence. By benchmarking exhaustive experiments on both simulated and real-life SMS datasets to assess the performance of kngMap with ten other popular SMS mapping tools (e.g., BLASR, BWA-MEM, and minimap2), we demonstrated that kngMap has higher sensitivity that can align more reads and bases to the reference genome; meanwhile, kngMap can produce consecutive alignments for the whole read and span different categories of SVs in the reads. kngMap is implemented in C++ and supports multi-threading; the source code of kngMap can be downloaded for free at: https://github.com/zhang134/kngMap for academic usage.
format Online
Article
Text
id pubmed-9117619
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-91176192022-05-20 kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph Wei, Ze-Gang Fan, Xing-Guo Zhang, Hao Zhang, Xiao-Dan Liu, Fei Qian, Yu Zhang, Shao-Wu Front Genet Genetics With the rapid development of single molecular sequencing (SMS) technologies such as PacBio single-molecule real-time and Oxford Nanopore sequencing, the output read length is continuously increasing, which has dramatical potentials on cutting-edge genomic applications. Mapping these reads to a reference genome is often the most fundamental and computing-intensive step for downstream analysis. However, these long reads contain higher sequencing errors and could more frequently span the breakpoints of structural variants (SVs) than those of shorter reads, leading to many unaligned reads or reads that are partially aligned for most state-of-the-art mappers. As a result, these methods usually focus on producing local mapping results for the query read rather than obtaining the whole end-to-end alignment. We introduce kngMap, a novel k-mer neighborhood graph-based mapper that is specifically designed to align long noisy SMS reads to a reference sequence. By benchmarking exhaustive experiments on both simulated and real-life SMS datasets to assess the performance of kngMap with ten other popular SMS mapping tools (e.g., BLASR, BWA-MEM, and minimap2), we demonstrated that kngMap has higher sensitivity that can align more reads and bases to the reference genome; meanwhile, kngMap can produce consecutive alignments for the whole read and span different categories of SVs in the reads. kngMap is implemented in C++ and supports multi-threading; the source code of kngMap can be downloaded for free at: https://github.com/zhang134/kngMap for academic usage. Frontiers Media S.A. 2022-05-05 /pmc/articles/PMC9117619/ /pubmed/35601495 http://dx.doi.org/10.3389/fgene.2022.890651 Text en Copyright © 2022 Wei, Fan, Zhang, Zhang, Liu, Qian and Zhang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Wei, Ze-Gang
Fan, Xing-Guo
Zhang, Hao
Zhang, Xiao-Dan
Liu, Fei
Qian, Yu
Zhang, Shao-Wu
kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_full kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_fullStr kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_full_unstemmed kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_short kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph
title_sort kngmap: sensitive and fast mapping algorithm for noisy long reads based on the k-mer neighborhood graph
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9117619/
https://www.ncbi.nlm.nih.gov/pubmed/35601495
http://dx.doi.org/10.3389/fgene.2022.890651
work_keys_str_mv AT weizegang kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT fanxingguo kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT zhanghao kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT zhangxiaodan kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT liufei kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT qianyu kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph
AT zhangshaowu kngmapsensitiveandfastmappingalgorithmfornoisylongreadsbasedonthekmerneighborhoodgraph