Cargando…

A benchmark study of k-mer counting methods for high-throughput sequencing

The rapid development of high-throughput sequencing technologies means that hundreds of gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools require counts of substrings of length k in DNA/RNA sequencing reads obtained for applications such as genome and transcri...

Descripción completa

Detalles Bibliográficos
Autores principales: Manekar, Swati C, Sathe, Shailesh R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6280066/
https://www.ncbi.nlm.nih.gov/pubmed/30346548
http://dx.doi.org/10.1093/gigascience/giy125
_version_ 1783378593349369856
author Manekar, Swati C
Sathe, Shailesh R
author_facet Manekar, Swati C
Sathe, Shailesh R
author_sort Manekar, Swati C
collection PubMed
description The rapid development of high-throughput sequencing technologies means that hundreds of gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools require counts of substrings of length k in DNA/RNA sequencing reads obtained for applications such as genome and transcriptome assembly, error correction, multiple sequence alignment, and repeat detection. Recently, several techniques have been developed to count k-mers in large sequencing datasets, with a trade-off between the time and memory required to perform this function. We assessed several k-mer counting programs and evaluated their relative performance, primarily on the basis of runtime and memory usage. We also considered additional parameters such as disk usage, accuracy, parallelism, the impact of compressed input, performance in terms of counting large k values and the scalability of the application to larger datasets.We make specific recommendations for the setup of a current state-of-the-art program and suggestions for further development.
format Online
Article
Text
id pubmed-6280066
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-62800662018-12-11 A benchmark study of k-mer counting methods for high-throughput sequencing Manekar, Swati C Sathe, Shailesh R Gigascience Review The rapid development of high-throughput sequencing technologies means that hundreds of gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools require counts of substrings of length k in DNA/RNA sequencing reads obtained for applications such as genome and transcriptome assembly, error correction, multiple sequence alignment, and repeat detection. Recently, several techniques have been developed to count k-mers in large sequencing datasets, with a trade-off between the time and memory required to perform this function. We assessed several k-mer counting programs and evaluated their relative performance, primarily on the basis of runtime and memory usage. We also considered additional parameters such as disk usage, accuracy, parallelism, the impact of compressed input, performance in terms of counting large k values and the scalability of the application to larger datasets.We make specific recommendations for the setup of a current state-of-the-art program and suggestions for further development. Oxford University Press 2018-10-22 /pmc/articles/PMC6280066/ /pubmed/30346548 http://dx.doi.org/10.1093/gigascience/giy125 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Manekar, Swati C
Sathe, Shailesh R
A benchmark study of k-mer counting methods for high-throughput sequencing
title A benchmark study of k-mer counting methods for high-throughput sequencing
title_full A benchmark study of k-mer counting methods for high-throughput sequencing
title_fullStr A benchmark study of k-mer counting methods for high-throughput sequencing
title_full_unstemmed A benchmark study of k-mer counting methods for high-throughput sequencing
title_short A benchmark study of k-mer counting methods for high-throughput sequencing
title_sort benchmark study of k-mer counting methods for high-throughput sequencing
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6280066/
https://www.ncbi.nlm.nih.gov/pubmed/30346548
http://dx.doi.org/10.1093/gigascience/giy125
work_keys_str_mv AT manekarswatic abenchmarkstudyofkmercountingmethodsforhighthroughputsequencing
AT satheshaileshr abenchmarkstudyofkmercountingmethodsforhighthroughputsequencing
AT manekarswatic benchmarkstudyofkmercountingmethodsforhighthroughputsequencing
AT satheshaileshr benchmarkstudyofkmercountingmethodsforhighthroughputsequencing