Cargando…

Ψ-RA: a parallel sparse index for genomic read alignment

BACKGROUND: Genomic read alignment involves mapping (exactly or approximately) short reads from a particular individual onto a pre-sequenced reference genome of the same species. Because all individuals of the same species share the majority of their genomes, short reads alignment provides an altern...

Descripción completa

Detalles Bibliográficos
Autores principales:	Oğuzhan Külekci, M, Hon, Wing-Kai, Shah, Rahul, Scott Vitter, Jeffrey, Xu, Bojian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194238/ https://www.ncbi.nlm.nih.gov/pubmed/21989248 http://dx.doi.org/10.1186/1471-2164-12-S2-S7

_version_	1782213935337832448
author	Oğuzhan Külekci, M Hon, Wing-Kai Shah, Rahul Scott Vitter, Jeffrey Xu, Bojian
author_facet	Oğuzhan Külekci, M Hon, Wing-Kai Shah, Rahul Scott Vitter, Jeffrey Xu, Bojian
author_sort	Oğuzhan Külekci, M
collection	PubMed
description	BACKGROUND: Genomic read alignment involves mapping (exactly or approximately) short reads from a particular individual onto a pre-sequenced reference genome of the same species. Because all individuals of the same species share the majority of their genomes, short reads alignment provides an alternative and much more efficient way to sequence the genome of a particular individual than does direct sequencing. Among many strategies proposed for this alignment process, indexing the reference genome and short read searching over the index is a dominant technique. Our goal is to design a space-efficient indexing structure with fast searching capability to catch the massive short reads produced by the next generation high-throughput DNA sequencing technology. RESULTS: We concentrate on indexing DNA sequences via sparse suffix arrays (SSAs) and propose a new short read aligner named Ψ-RA (PSI-RA: parallel sparse index read aligner). The motivation in using SSAs is the ability to trade memory against time. It is possible to fine tune the space consumption of the index based on the available memory of the machine and the minimum length of the arriving pattern queries. Although SSAs have been studied before for exact matching of short reads, an elegant way of approximate matching capability was missing. We provide this by defining the rightmost mismatch criteria that prioritize the errors towards the end of the reads, where errors are more probable. Ψ-RA supports any number of mismatches in aligning reads. We give comparisons with some of the well-known short read aligners, and show that indexing a genome with SSA is a good alternative to the Burrows-Wheeler transform or seed-based solutions. CONCLUSIONS: Ψ-RA is expected to serve as a valuable tool in the alignment of short reads generated by the next generation high-throughput sequencing technology. Ψ-RA is very fast in exact matching and also supports rightmost approximate matching. The SSA structure that Ψ-RA is built on naturally incorporates the modern multicore architecture and thus further speed-up can be gained. All the information, including the source code of Ψ-RA, can be downloaded at: http://www.busillis.com/o_kulekci/PSIRA.zip.
format	Online Article Text
id	pubmed-3194238
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31942382011-10-17 Ψ-RA: a parallel sparse index for genomic read alignment Oğuzhan Külekci, M Hon, Wing-Kai Shah, Rahul Scott Vitter, Jeffrey Xu, Bojian BMC Genomics Proceedings BACKGROUND: Genomic read alignment involves mapping (exactly or approximately) short reads from a particular individual onto a pre-sequenced reference genome of the same species. Because all individuals of the same species share the majority of their genomes, short reads alignment provides an alternative and much more efficient way to sequence the genome of a particular individual than does direct sequencing. Among many strategies proposed for this alignment process, indexing the reference genome and short read searching over the index is a dominant technique. Our goal is to design a space-efficient indexing structure with fast searching capability to catch the massive short reads produced by the next generation high-throughput DNA sequencing technology. RESULTS: We concentrate on indexing DNA sequences via sparse suffix arrays (SSAs) and propose a new short read aligner named Ψ-RA (PSI-RA: parallel sparse index read aligner). The motivation in using SSAs is the ability to trade memory against time. It is possible to fine tune the space consumption of the index based on the available memory of the machine and the minimum length of the arriving pattern queries. Although SSAs have been studied before for exact matching of short reads, an elegant way of approximate matching capability was missing. We provide this by defining the rightmost mismatch criteria that prioritize the errors towards the end of the reads, where errors are more probable. Ψ-RA supports any number of mismatches in aligning reads. We give comparisons with some of the well-known short read aligners, and show that indexing a genome with SSA is a good alternative to the Burrows-Wheeler transform or seed-based solutions. CONCLUSIONS: Ψ-RA is expected to serve as a valuable tool in the alignment of short reads generated by the next generation high-throughput sequencing technology. Ψ-RA is very fast in exact matching and also supports rightmost approximate matching. The SSA structure that Ψ-RA is built on naturally incorporates the modern multicore architecture and thus further speed-up can be gained. All the information, including the source code of Ψ-RA, can be downloaded at: http://www.busillis.com/o_kulekci/PSIRA.zip. BioMed Central 2011-07-27 /pmc/articles/PMC3194238/ /pubmed/21989248 http://dx.doi.org/10.1186/1471-2164-12-S2-S7 Text en Copyright ©2011 Oğuzhan Külekci et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Oğuzhan Külekci, M Hon, Wing-Kai Shah, Rahul Scott Vitter, Jeffrey Xu, Bojian Ψ-RA: a parallel sparse index for genomic read alignment
title	Ψ-RA: a parallel sparse index for genomic read alignment
title_full	Ψ-RA: a parallel sparse index for genomic read alignment
title_fullStr	Ψ-RA: a parallel sparse index for genomic read alignment
title_full_unstemmed	Ψ-RA: a parallel sparse index for genomic read alignment
title_short	Ψ-RA: a parallel sparse index for genomic read alignment
title_sort	ψ-ra: a parallel sparse index for genomic read alignment
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194238/ https://www.ncbi.nlm.nih.gov/pubmed/21989248 http://dx.doi.org/10.1186/1471-2164-12-S2-S7
work_keys_str_mv	AT oguzhankulekcim psraaparallelsparseindexforgenomicreadalignment AT honwingkai psraaparallelsparseindexforgenomicreadalignment AT shahrahul psraaparallelsparseindexforgenomicreadalignment AT scottvitterjeffrey psraaparallelsparseindexforgenomicreadalignment AT xubojian psraaparallelsparseindexforgenomicreadalignment

Ψ-RA: a parallel sparse index for genomic read alignment

Ejemplares similares