Cargando…

Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values

BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignme...

Descripción completa

Detalles Bibliográficos
Autores principales: Comin, Matteo, Schimd, Michele
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989896/
https://www.ncbi.nlm.nih.gov/pubmed/27535823
http://dx.doi.org/10.1186/s12920-016-0193-6
_version_ 1782448623700672512
author Comin, Matteo
Schimd, Michele
author_facet Comin, Matteo
Schimd, Michele
author_sort Comin, Matteo
collection PubMed
description BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. RESULTS: In this paper we present a family of alignment-free measures, called d(q)-type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. CONCLUSION: The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments.
format Online
Article
Text
id pubmed-4989896
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49898962016-08-30 Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values Comin, Matteo Schimd, Michele BMC Med Genomics Research BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. RESULTS: In this paper we present a family of alignment-free measures, called d(q)-type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. CONCLUSION: The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments. BioMed Central 2016-08-12 /pmc/articles/PMC4989896/ /pubmed/27535823 http://dx.doi.org/10.1186/s12920-016-0193-6 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Comin, Matteo
Schimd, Michele
Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_full Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_fullStr Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_full_unstemmed Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_short Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
title_sort fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989896/
https://www.ncbi.nlm.nih.gov/pubmed/27535823
http://dx.doi.org/10.1186/s12920-016-0193-6
work_keys_str_mv AT cominmatteo fastcomparisonofgenomicandmetagenomicreadswithalignmentfreemeasuresbasedonqualityvalues
AT schimdmichele fastcomparisonofgenomicandmetagenomicreadswithalignmentfreemeasuresbasedonqualityvalues