Cargando…
Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values
BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignme...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989896/ https://www.ncbi.nlm.nih.gov/pubmed/27535823 http://dx.doi.org/10.1186/s12920-016-0193-6 |
_version_ | 1782448623700672512 |
---|---|
author | Comin, Matteo Schimd, Michele |
author_facet | Comin, Matteo Schimd, Michele |
author_sort | Comin, Matteo |
collection | PubMed |
description | BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. RESULTS: In this paper we present a family of alignment-free measures, called d(q)-type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. CONCLUSION: The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments. |
format | Online Article Text |
id | pubmed-4989896 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49898962016-08-30 Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values Comin, Matteo Schimd, Michele BMC Med Genomics Research BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. RESULTS: In this paper we present a family of alignment-free measures, called d(q)-type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. CONCLUSION: The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments. BioMed Central 2016-08-12 /pmc/articles/PMC4989896/ /pubmed/27535823 http://dx.doi.org/10.1186/s12920-016-0193-6 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Comin, Matteo Schimd, Michele Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values |
title | Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values |
title_full | Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values |
title_fullStr | Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values |
title_full_unstemmed | Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values |
title_short | Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values |
title_sort | fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4989896/ https://www.ncbi.nlm.nih.gov/pubmed/27535823 http://dx.doi.org/10.1186/s12920-016-0193-6 |
work_keys_str_mv | AT cominmatteo fastcomparisonofgenomicandmetagenomicreadswithalignmentfreemeasuresbasedonqualityvalues AT schimdmichele fastcomparisonofgenomicandmetagenomicreadswithalignmentfreemeasuresbasedonqualityvalues |