Cargando…

Re-alignment of the unmapped reads with base quality score

MOTIVATION: Based on the next generation genome sequencing technologies, a variety of biological applications are developed, while alignment is the first step once the sequencing reads are obtained. In recent years, many software tools have been developed to efficiently and accurately align short re...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Xiaoqing, Wang, Jianxin, Zhang, Zhen, Xiao, Qianghua, Li, Min, Pan, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4402702/
https://www.ncbi.nlm.nih.gov/pubmed/25860434
http://dx.doi.org/10.1186/1471-2105-16-S5-S8
_version_ 1782367292830515200
author Peng, Xiaoqing
Wang, Jianxin
Zhang, Zhen
Xiao, Qianghua
Li, Min
Pan, Yi
author_facet Peng, Xiaoqing
Wang, Jianxin
Zhang, Zhen
Xiao, Qianghua
Li, Min
Pan, Yi
author_sort Peng, Xiaoqing
collection PubMed
description MOTIVATION: Based on the next generation genome sequencing technologies, a variety of biological applications are developed, while alignment is the first step once the sequencing reads are obtained. In recent years, many software tools have been developed to efficiently and accurately align short reads to the reference genome. However, there are still many reads that can't be mapped to the reference genome, due to the exceeding of allowable mismatches. Moreover, besides the unmapped reads, the reads with low mapping qualities are also excluded from the downstream analysis, such as variance calling. If we can take advantages of the confident segments of these reads, not only can the alignment rates be improved, but also more information will be provided for the downstream analysis. RESULTS: This paper proposes a method, called RAUR (Re-align the Unmapped Reads), to re-align the reads that can not be mapped by alignment tools. Firstly, it takes advantages of the base quality scores (reported by the sequencer) to figure out the most confident and informative segments of the unmapped reads by controlling the number of possible mismatches in the alignment. Then, combined with an alignment tool, RAUR re-align these segments of the reads. We run RAUR on both simulated data and real data with different read lengths. The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision. Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases. Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing. AVAILABILITY: All source code are available at http://netlab.csu.edu.cn/bioinformatics/RAUR.html.
format Online
Article
Text
id pubmed-4402702
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44027022015-04-29 Re-alignment of the unmapped reads with base quality score Peng, Xiaoqing Wang, Jianxin Zhang, Zhen Xiao, Qianghua Li, Min Pan, Yi BMC Bioinformatics Proceedings MOTIVATION: Based on the next generation genome sequencing technologies, a variety of biological applications are developed, while alignment is the first step once the sequencing reads are obtained. In recent years, many software tools have been developed to efficiently and accurately align short reads to the reference genome. However, there are still many reads that can't be mapped to the reference genome, due to the exceeding of allowable mismatches. Moreover, besides the unmapped reads, the reads with low mapping qualities are also excluded from the downstream analysis, such as variance calling. If we can take advantages of the confident segments of these reads, not only can the alignment rates be improved, but also more information will be provided for the downstream analysis. RESULTS: This paper proposes a method, called RAUR (Re-align the Unmapped Reads), to re-align the reads that can not be mapped by alignment tools. Firstly, it takes advantages of the base quality scores (reported by the sequencer) to figure out the most confident and informative segments of the unmapped reads by controlling the number of possible mismatches in the alignment. Then, combined with an alignment tool, RAUR re-align these segments of the reads. We run RAUR on both simulated data and real data with different read lengths. The results show that many reads which fail to be aligned by the most popular alignment tools (BWA and Bowtie2) can be correctly re-aligned by RAUR, with a similar Precision. Even compared with the BWA-MEM and the local mode of Bowtie2, which perform local alignment for long reads to improve the alignment rate, RAUR also shows advantages on the Alignment rate and Precision in some cases. Therefore, the trimming strategy used in RAUR is useful to improve the Alignment rate of alignment tools for the next-generation genome sequencing. AVAILABILITY: All source code are available at http://netlab.csu.edu.cn/bioinformatics/RAUR.html. BioMed Central 2015-03-18 /pmc/articles/PMC4402702/ /pubmed/25860434 http://dx.doi.org/10.1186/1471-2105-16-S5-S8 Text en Copyright © 2015 Peng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Peng, Xiaoqing
Wang, Jianxin
Zhang, Zhen
Xiao, Qianghua
Li, Min
Pan, Yi
Re-alignment of the unmapped reads with base quality score
title Re-alignment of the unmapped reads with base quality score
title_full Re-alignment of the unmapped reads with base quality score
title_fullStr Re-alignment of the unmapped reads with base quality score
title_full_unstemmed Re-alignment of the unmapped reads with base quality score
title_short Re-alignment of the unmapped reads with base quality score
title_sort re-alignment of the unmapped reads with base quality score
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4402702/
https://www.ncbi.nlm.nih.gov/pubmed/25860434
http://dx.doi.org/10.1186/1471-2105-16-S5-S8
work_keys_str_mv AT pengxiaoqing realignmentoftheunmappedreadswithbasequalityscore
AT wangjianxin realignmentoftheunmappedreadswithbasequalityscore
AT zhangzhen realignmentoftheunmappedreadswithbasequalityscore
AT xiaoqianghua realignmentoftheunmappedreadswithbasequalityscore
AT limin realignmentoftheunmappedreadswithbasequalityscore
AT panyi realignmentoftheunmappedreadswithbasequalityscore