Cargando…

Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures

With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very differ...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hoffmann, Steve, Otto, Christian, Kurtz, Stefan, Sharma, Cynthia M., Khaitovich, Philipp, Vogel, Jörg, Stadler, Peter F., Hackermüller, Jörg
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2730575/ https://www.ncbi.nlm.nih.gov/pubmed/19750212 http://dx.doi.org/10.1371/journal.pcbi.1000502

_version_	1782170906740654080
author	Hoffmann, Steve Otto, Christian Kurtz, Stefan Sharma, Cynthia M. Khaitovich, Philipp Vogel, Jörg Stadler, Peter F. Hackermüller, Jörg
author_facet	Hoffmann, Steve Otto, Christian Kurtz, Stefan Sharma, Cynthia M. Khaitovich, Philipp Vogel, Jörg Stadler, Peter F. Hackermüller, Jörg
author_sort	Hoffmann, Steve
collection	PubMed
description	With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For example, it can handle the problem of leading and trailing contaminations caused by primers and poly-A tails in transcriptomics or the length-dependent increase of error rates. In these contexts, it thus simplifies the tedious and error-prone trimming step. For efficient searches, our method utilizes index structures in the form of enhanced suffix arrays. In a comparison with current methods for short read mapping, the presented approach shows significantly increased performance not only for 454 reads, but also for Illumina reads. Our approach is implemented in the software segemehl available at http://www.bioinf.uni-leipzig.de/Software/segemehl/.
format	Text
id	pubmed-2730575
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-27305752009-09-11 Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures Hoffmann, Steve Otto, Christian Kurtz, Stefan Sharma, Cynthia M. Khaitovich, Philipp Vogel, Jörg Stadler, Peter F. Hackermüller, Jörg PLoS Comput Biol Research Article With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For example, it can handle the problem of leading and trailing contaminations caused by primers and poly-A tails in transcriptomics or the length-dependent increase of error rates. In these contexts, it thus simplifies the tedious and error-prone trimming step. For efficient searches, our method utilizes index structures in the form of enhanced suffix arrays. In a comparison with current methods for short read mapping, the presented approach shows significantly increased performance not only for 454 reads, but also for Illumina reads. Our approach is implemented in the software segemehl available at http://www.bioinf.uni-leipzig.de/Software/segemehl/. Public Library of Science 2009-09-11 /pmc/articles/PMC2730575/ /pubmed/19750212 http://dx.doi.org/10.1371/journal.pcbi.1000502 Text en Hoffmann et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Hoffmann, Steve Otto, Christian Kurtz, Stefan Sharma, Cynthia M. Khaitovich, Philipp Vogel, Jörg Stadler, Peter F. Hackermüller, Jörg Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures
title	Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures
title_full	Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures
title_fullStr	Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures
title_full_unstemmed	Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures
title_short	Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures
title_sort	fast mapping of short sequences with mismatches, insertions and deletions using index structures
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2730575/ https://www.ncbi.nlm.nih.gov/pubmed/19750212 http://dx.doi.org/10.1371/journal.pcbi.1000502
work_keys_str_mv	AT hoffmannsteve fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT ottochristian fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT kurtzstefan fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT sharmacynthiam fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT khaitovichphilipp fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT vogeljorg fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT stadlerpeterf fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures AT hackermullerjorg fastmappingofshortsequenceswithmismatchesinsertionsanddeletionsusingindexstructures

Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures

Ejemplares similares