Cargando…

The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote

Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from...

Descripción completa

Detalles Bibliográficos
Autores principales: Liao, Yang, Smyth, Gordon K., Shi, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3664803/
https://www.ncbi.nlm.nih.gov/pubmed/23558742
http://dx.doi.org/10.1093/nar/gkt214
_version_ 1782271165661708288
author Liao, Yang
Smyth, Gordon K.
Shi, Wei
author_facet Liao, Yang
Smyth, Gordon K.
Shi, Wei
author_sort Liao, Yang
collection PubMed
description Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.
format Online
Article
Text
id pubmed-3664803
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36648032013-05-28 The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote Liao, Yang Smyth, Gordon K. Shi, Wei Nucleic Acids Res Methods Online Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads. Oxford University Press 2013-05 2013-04-03 /pmc/articles/PMC3664803/ /pubmed/23558742 http://dx.doi.org/10.1093/nar/gkt214 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Liao, Yang
Smyth, Gordon K.
Shi, Wei
The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote
title The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote
title_full The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote
title_fullStr The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote
title_full_unstemmed The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote
title_short The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote
title_sort subread aligner: fast, accurate and scalable read mapping by seed-and-vote
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3664803/
https://www.ncbi.nlm.nih.gov/pubmed/23558742
http://dx.doi.org/10.1093/nar/gkt214
work_keys_str_mv AT liaoyang thesubreadalignerfastaccurateandscalablereadmappingbyseedandvote
AT smythgordonk thesubreadalignerfastaccurateandscalablereadmappingbyseedandvote
AT shiwei thesubreadalignerfastaccurateandscalablereadmappingbyseedandvote
AT liaoyang subreadalignerfastaccurateandscalablereadmappingbyseedandvote
AT smythgordonk subreadalignerfastaccurateandscalablereadmappingbyseedandvote
AT shiwei subreadalignerfastaccurateandscalablereadmappingbyseedandvote