Cargando…
Indexing Arbitrary-Length k-Mers in Sequencing Reads
We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504488/ https://www.ncbi.nlm.nih.gov/pubmed/26182400 http://dx.doi.org/10.1371/journal.pone.0133198 |
_version_ | 1782381467935965184 |
---|---|
author | Kowalski, Tomasz Grabowski, Szymon Deorowicz, Sebastian |
author_facet | Kowalski, Tomasz Grabowski, Szymon Deorowicz, Sebastian |
author_sort | Kowalski, Tomasz |
collection | PubMed |
description | We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments. |
format | Online Article Text |
id | pubmed-4504488 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-45044882015-07-17 Indexing Arbitrary-Length k-Mers in Sequencing Reads Kowalski, Tomasz Grabowski, Szymon Deorowicz, Sebastian PLoS One Research Article We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments. Public Library of Science 2015-07-16 /pmc/articles/PMC4504488/ /pubmed/26182400 http://dx.doi.org/10.1371/journal.pone.0133198 Text en © 2015 Kowalski et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Kowalski, Tomasz Grabowski, Szymon Deorowicz, Sebastian Indexing Arbitrary-Length k-Mers in Sequencing Reads |
title | Indexing Arbitrary-Length k-Mers in Sequencing Reads |
title_full | Indexing Arbitrary-Length k-Mers in Sequencing Reads |
title_fullStr | Indexing Arbitrary-Length k-Mers in Sequencing Reads |
title_full_unstemmed | Indexing Arbitrary-Length k-Mers in Sequencing Reads |
title_short | Indexing Arbitrary-Length k-Mers in Sequencing Reads |
title_sort | indexing arbitrary-length k-mers in sequencing reads |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504488/ https://www.ncbi.nlm.nih.gov/pubmed/26182400 http://dx.doi.org/10.1371/journal.pone.0133198 |
work_keys_str_mv | AT kowalskitomasz indexingarbitrarylengthkmersinsequencingreads AT grabowskiszymon indexingarbitrarylengthkmersinsequencingreads AT deorowiczsebastian indexingarbitrarylengthkmersinsequencingreads |