Cargando…

Clustering of reads with alignment-free measures and quality values

BACKGROUND: The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant r...

Descripción completa

Detalles Bibliográficos
Autores principales:	Comin, Matteo, Leoni, Andrea, Schimd, Michele
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331138/ https://www.ncbi.nlm.nih.gov/pubmed/25691913 http://dx.doi.org/10.1186/s13015-014-0029-x

_version_	1782357668727357440
author	Comin, Matteo Leoni, Andrea Schimd, Michele
author_facet	Comin, Matteo Leoni, Andrea Schimd, Michele
author_sort	Comin, Matteo
collection	PubMed
description	BACKGROUND: The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant reads in a single cluster to improve the run time, memory requirements, and quality of post-processing steps like assembly and error correction. Several alignment-free measures, based on k-mers counts, have been used to cluster reads. Quality scores produced by NGS platforms are fundamental for various analysis of NGS data like reads mapping and error detection. Moreover future-generation sequencing platforms will produce long reads but with a large number of erroneous bases (up to 15 %). RESULTS: In this scenario it will be fundamental to exploit quality value information within the alignment-free framework. To the best of our knowledge this is the first study that incorporates quality value information and k-mers counts, in the context of alignment-free measures, for the comparison of reads data. Based on this principles, in this paper we present a family of alignment-free measures called D(q)-type. A set of experiments on simulated and real reads data confirms that the new measures are superior to other classical alignment-free statistics, especially when erroneous reads are considered. Also results on de novo assembly and metagenomic reads classification show that the introduction of quality values improves over standard alignment-free measures. These statistics are implemented in a software called QCluster (http://www.dei.unipd.it/~ciompin/main/qcluster.html).
format	Online Article Text
id	pubmed-4331138
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43311382015-02-18 Clustering of reads with alignment-free measures and quality values Comin, Matteo Leoni, Andrea Schimd, Michele Algorithms Mol Biol Research BACKGROUND: The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant reads in a single cluster to improve the run time, memory requirements, and quality of post-processing steps like assembly and error correction. Several alignment-free measures, based on k-mers counts, have been used to cluster reads. Quality scores produced by NGS platforms are fundamental for various analysis of NGS data like reads mapping and error detection. Moreover future-generation sequencing platforms will produce long reads but with a large number of erroneous bases (up to 15 %). RESULTS: In this scenario it will be fundamental to exploit quality value information within the alignment-free framework. To the best of our knowledge this is the first study that incorporates quality value information and k-mers counts, in the context of alignment-free measures, for the comparison of reads data. Based on this principles, in this paper we present a family of alignment-free measures called D(q)-type. A set of experiments on simulated and real reads data confirms that the new measures are superior to other classical alignment-free statistics, especially when erroneous reads are considered. Also results on de novo assembly and metagenomic reads classification show that the introduction of quality values improves over standard alignment-free measures. These statistics are implemented in a software called QCluster (http://www.dei.unipd.it/~ciompin/main/qcluster.html). BioMed Central 2015-01-28 /pmc/articles/PMC4331138/ /pubmed/25691913 http://dx.doi.org/10.1186/s13015-014-0029-x Text en © Comin et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Comin, Matteo Leoni, Andrea Schimd, Michele Clustering of reads with alignment-free measures and quality values
title	Clustering of reads with alignment-free measures and quality values
title_full	Clustering of reads with alignment-free measures and quality values
title_fullStr	Clustering of reads with alignment-free measures and quality values
title_full_unstemmed	Clustering of reads with alignment-free measures and quality values
title_short	Clustering of reads with alignment-free measures and quality values
title_sort	clustering of reads with alignment-free measures and quality values
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331138/ https://www.ncbi.nlm.nih.gov/pubmed/25691913 http://dx.doi.org/10.1186/s13015-014-0029-x
work_keys_str_mv	AT cominmatteo clusteringofreadswithalignmentfreemeasuresandqualityvalues AT leoniandrea clusteringofreadswithalignmentfreemeasuresandqualityvalues AT schimdmichele clusteringofreadswithalignmentfreemeasuresandqualityvalues

Clustering of reads with alignment-free measures and quality values

Ejemplares similares