Cargando…
KAnalyze: a fast versatile pipelined K-mer toolkit
Motivation: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical inte...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4080738/ https://www.ncbi.nlm.nih.gov/pubmed/24642064 http://dx.doi.org/10.1093/bioinformatics/btu152 |
_version_ | 1782324031017451520 |
---|---|
author | Audano, Peter Vannberg, Fredrik |
author_facet | Audano, Peter Vannberg, Fredrik |
author_sort | Audano, Peter |
collection | PubMed |
description | Motivation: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language. Results: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes. Availability and implementation: KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/ Contact: fredrik.vannberg@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4080738 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-40807382014-07-03 KAnalyze: a fast versatile pipelined K-mer toolkit Audano, Peter Vannberg, Fredrik Bioinformatics Applications Notes Motivation: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language. Results: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes. Availability and implementation: KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/ Contact: fredrik.vannberg@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-07-15 2014-03-18 /pmc/articles/PMC4080738/ /pubmed/24642064 http://dx.doi.org/10.1093/bioinformatics/btu152 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Audano, Peter Vannberg, Fredrik KAnalyze: a fast versatile pipelined K-mer toolkit |
title | KAnalyze: a fast versatile pipelined K-mer toolkit |
title_full | KAnalyze: a fast versatile pipelined K-mer toolkit |
title_fullStr | KAnalyze: a fast versatile pipelined K-mer toolkit |
title_full_unstemmed | KAnalyze: a fast versatile pipelined K-mer toolkit |
title_short | KAnalyze: a fast versatile pipelined K-mer toolkit |
title_sort | kanalyze: a fast versatile pipelined k-mer toolkit |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4080738/ https://www.ncbi.nlm.nih.gov/pubmed/24642064 http://dx.doi.org/10.1093/bioinformatics/btu152 |
work_keys_str_mv | AT audanopeter kanalyzeafastversatilepipelinedkmertoolkit AT vannbergfredrik kanalyzeafastversatilepipelinedkmertoolkit |