Cargando…

Incorporating sequence quality data into alignment improves DNA read mapping

New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base...

Descripción completa

Detalles Bibliográficos
Autores principales: Frith, Martin C., Wan, Raymond, Horton, Paul
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2853142/
https://www.ncbi.nlm.nih.gov/pubmed/20110255
http://dx.doi.org/10.1093/nar/gkq010
_version_ 1782180018329223168
author Frith, Martin C.
Wan, Raymond
Horton, Paul
author_facet Frith, Martin C.
Wan, Raymond
Horton, Paul
author_sort Frith, Martin C.
collection PubMed
description New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read mapping, our method models both sequencer errors and real sequence differences. This approach consistently improves mapping accuracy, even when the rate of real sequence difference is only 0.2%. Furthermore, when mapping Drosophila melanogaster reads to the Drosophila simulans genome, it increased the amount of correctly mapped reads from 49 to 66%. This approach enables more effective use of DNA reads from organisms that lack reference genomes, are extinct or are highly polymorphic.
format Text
id pubmed-2853142
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28531422010-04-12 Incorporating sequence quality data into alignment improves DNA read mapping Frith, Martin C. Wan, Raymond Horton, Paul Nucleic Acids Res Methods Online New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read mapping, our method models both sequencer errors and real sequence differences. This approach consistently improves mapping accuracy, even when the rate of real sequence difference is only 0.2%. Furthermore, when mapping Drosophila melanogaster reads to the Drosophila simulans genome, it increased the amount of correctly mapped reads from 49 to 66%. This approach enables more effective use of DNA reads from organisms that lack reference genomes, are extinct or are highly polymorphic. Oxford University Press 2010-04 2010-01-27 /pmc/articles/PMC2853142/ /pubmed/20110255 http://dx.doi.org/10.1093/nar/gkq010 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Frith, Martin C.
Wan, Raymond
Horton, Paul
Incorporating sequence quality data into alignment improves DNA read mapping
title Incorporating sequence quality data into alignment improves DNA read mapping
title_full Incorporating sequence quality data into alignment improves DNA read mapping
title_fullStr Incorporating sequence quality data into alignment improves DNA read mapping
title_full_unstemmed Incorporating sequence quality data into alignment improves DNA read mapping
title_short Incorporating sequence quality data into alignment improves DNA read mapping
title_sort incorporating sequence quality data into alignment improves dna read mapping
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2853142/
https://www.ncbi.nlm.nih.gov/pubmed/20110255
http://dx.doi.org/10.1093/nar/gkq010
work_keys_str_mv AT frithmartinc incorporatingsequencequalitydataintoalignmentimprovesdnareadmapping
AT wanraymond incorporatingsequencequalitydataintoalignmentimprovesdnareadmapping
AT hortonpaul incorporatingsequencequalitydataintoalignmentimprovesdnareadmapping