Cargando…
Using quality scores and longer reads improves accuracy of Solexa read mapping
BACKGROUND: Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sa...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2335322/ https://www.ncbi.nlm.nih.gov/pubmed/18307793 http://dx.doi.org/10.1186/1471-2105-9-128 |
_version_ | 1782152822506127360 |
---|---|
author | Smith, Andrew D Xuan, Zhenyu Zhang, Michael Q |
author_facet | Smith, Andrew D Xuan, Zhenyu Zhang, Michael Q |
author_sort | Smith, Andrew D |
collection | PubMed |
description | BACKGROUND: Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores. RESULTS: To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at . CONCLUSION: Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects. |
format | Text |
id | pubmed-2335322 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23353222008-04-26 Using quality scores and longer reads improves accuracy of Solexa read mapping Smith, Andrew D Xuan, Zhenyu Zhang, Michael Q BMC Bioinformatics Research Article BACKGROUND: Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores. RESULTS: To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at . CONCLUSION: Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects. BioMed Central 2008-02-28 /pmc/articles/PMC2335322/ /pubmed/18307793 http://dx.doi.org/10.1186/1471-2105-9-128 Text en Copyright © 2008 Smith et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Smith, Andrew D Xuan, Zhenyu Zhang, Michael Q Using quality scores and longer reads improves accuracy of Solexa read mapping |
title | Using quality scores and longer reads improves accuracy of Solexa read mapping |
title_full | Using quality scores and longer reads improves accuracy of Solexa read mapping |
title_fullStr | Using quality scores and longer reads improves accuracy of Solexa read mapping |
title_full_unstemmed | Using quality scores and longer reads improves accuracy of Solexa read mapping |
title_short | Using quality scores and longer reads improves accuracy of Solexa read mapping |
title_sort | using quality scores and longer reads improves accuracy of solexa read mapping |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2335322/ https://www.ncbi.nlm.nih.gov/pubmed/18307793 http://dx.doi.org/10.1186/1471-2105-9-128 |
work_keys_str_mv | AT smithandrewd usingqualityscoresandlongerreadsimprovesaccuracyofsolexareadmapping AT xuanzhenyu usingqualityscoresandlongerreadsimprovesaccuracyofsolexareadmapping AT zhangmichaelq usingqualityscoresandlongerreadsimprovesaccuracyofsolexareadmapping |