Cargando…

Using quality scores and longer reads improves accuracy of Solexa read mapping

BACKGROUND: Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sa...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Andrew D, Xuan, Zhenyu, Zhang, Michael Q
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2335322/
https://www.ncbi.nlm.nih.gov/pubmed/18307793
http://dx.doi.org/10.1186/1471-2105-9-128
_version_ 1782152822506127360
author Smith, Andrew D
Xuan, Zhenyu
Zhang, Michael Q
author_facet Smith, Andrew D
Xuan, Zhenyu
Zhang, Michael Q
author_sort Smith, Andrew D
collection PubMed
description BACKGROUND: Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores. RESULTS: To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at . CONCLUSION: Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects.
format Text
id pubmed-2335322
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23353222008-04-26 Using quality scores and longer reads improves accuracy of Solexa read mapping Smith, Andrew D Xuan, Zhenyu Zhang, Michael Q BMC Bioinformatics Research Article BACKGROUND: Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores. RESULTS: To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at . CONCLUSION: Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects. BioMed Central 2008-02-28 /pmc/articles/PMC2335322/ /pubmed/18307793 http://dx.doi.org/10.1186/1471-2105-9-128 Text en Copyright © 2008 Smith et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Smith, Andrew D
Xuan, Zhenyu
Zhang, Michael Q
Using quality scores and longer reads improves accuracy of Solexa read mapping
title Using quality scores and longer reads improves accuracy of Solexa read mapping
title_full Using quality scores and longer reads improves accuracy of Solexa read mapping
title_fullStr Using quality scores and longer reads improves accuracy of Solexa read mapping
title_full_unstemmed Using quality scores and longer reads improves accuracy of Solexa read mapping
title_short Using quality scores and longer reads improves accuracy of Solexa read mapping
title_sort using quality scores and longer reads improves accuracy of solexa read mapping
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2335322/
https://www.ncbi.nlm.nih.gov/pubmed/18307793
http://dx.doi.org/10.1186/1471-2105-9-128
work_keys_str_mv AT smithandrewd usingqualityscoresandlongerreadsimprovesaccuracyofsolexareadmapping
AT xuanzhenyu usingqualityscoresandlongerreadsimprovesaccuracyofsolexareadmapping
AT zhangmichaelq usingqualityscoresandlongerreadsimprovesaccuracyofsolexareadmapping