Cargando…

Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values

BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignme...

Descripción completa

Detalles Bibliográficos
Autores principales:	Comin, Matteo, Schimd, Michele
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989896/ https://www.ncbi.nlm.nih.gov/pubmed/27535823 http://dx.doi.org/10.1186/s12920-016-0193-6

_version_	1782448623700672512
author	Comin, Matteo Schimd, Michele
author_facet	Comin, Matteo Schimd, Michele
author_sort	Comin, Matteo
collection	PubMed
description	BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. RESULTS: In this paper we present a family of alignment-free measures, called d(q)-type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. CONCLUSION: The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments.
format	Online Article Text
id	pubmed-4989896
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-49898962016-08-30 Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values Comin, Matteo Schimd, Michele BMC Med Genomics Research BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. RESULTS: In this paper we present a family of alignment-free measures, called d(q)-type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. CONCLUSION: The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments. BioMed Central 2016-08-12 /pmc/articles/PMC4989896/ /pubmed/27535823 http://dx.doi.org/10.1186/s12920-016-0193-6 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Comin, Matteo Schimd, Michele Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title	Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_full	Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_fullStr	Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_full_unstemmed	Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_short	Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_sort	fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989896/ https://www.ncbi.nlm.nih.gov/pubmed/27535823 http://dx.doi.org/10.1186/s12920-016-0193-6
work_keys_str_mv	AT cominmatteo fastcomparisonofgenomicandmetagenomicreadswithalignmentfreemeasuresbasedonqualityvalues AT schimdmichele fastcomparisonofgenomicandmetagenomicreadswithalignmentfreemeasuresbasedonqualityvalues

Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values

Ejemplares similares