Cargando…

Indexing Arbitrary-Length k-Mers in Sequencing Reads

We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kowalski, Tomasz, Grabowski, Szymon, Deorowicz, Sebastian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504488/ https://www.ncbi.nlm.nih.gov/pubmed/26182400 http://dx.doi.org/10.1371/journal.pone.0133198

_version_	1782381467935965184
author	Kowalski, Tomasz Grabowski, Szymon Deorowicz, Sebastian
author_facet	Kowalski, Tomasz Grabowski, Szymon Deorowicz, Sebastian
author_sort	Kowalski, Tomasz
collection	PubMed
description	We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments.
format	Online Article Text
id	pubmed-4504488
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-45044882015-07-17 Indexing Arbitrary-Length k-Mers in Sequencing Reads Kowalski, Tomasz Grabowski, Szymon Deorowicz, Sebastian PLoS One Research Article We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments. Public Library of Science 2015-07-16 /pmc/articles/PMC4504488/ /pubmed/26182400 http://dx.doi.org/10.1371/journal.pone.0133198 Text en © 2015 Kowalski et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Kowalski, Tomasz Grabowski, Szymon Deorowicz, Sebastian Indexing Arbitrary-Length k-Mers in Sequencing Reads
title	Indexing Arbitrary-Length k-Mers in Sequencing Reads
title_full	Indexing Arbitrary-Length k-Mers in Sequencing Reads
title_fullStr	Indexing Arbitrary-Length k-Mers in Sequencing Reads
title_full_unstemmed	Indexing Arbitrary-Length k-Mers in Sequencing Reads
title_short	Indexing Arbitrary-Length k-Mers in Sequencing Reads
title_sort	indexing arbitrary-length k-mers in sequencing reads
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504488/ https://www.ncbi.nlm.nih.gov/pubmed/26182400 http://dx.doi.org/10.1371/journal.pone.0133198
work_keys_str_mv	AT kowalskitomasz indexingarbitrarylengthkmersinsequencingreads AT grabowskiszymon indexingarbitrarylengthkmersinsequencingreads AT deorowiczsebastian indexingarbitrarylengthkmersinsequencingreads

Indexing Arbitrary-Length k-Mers in Sequencing Reads

Ejemplares similares