Cargando…
Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns
BACKGROUND: With the advent of Next-Generation Sequencing technologies (NGS), a large amount of short read data has been generated. If a reference genome is not available, the assembly of a template sequence is usually challenging because of repeats and the short length of reads. When NGS reads cann...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168702/ https://www.ncbi.nlm.nih.gov/pubmed/25252700 http://dx.doi.org/10.1186/1471-2105-15-S9-S1 |
_version_ | 1782335602033688576 |
---|---|
author | Comin, Matteo Schimd, Michele |
author_facet | Comin, Matteo Schimd, Michele |
author_sort | Comin, Matteo |
collection | PubMed |
description | BACKGROUND: With the advent of Next-Generation Sequencing technologies (NGS), a large amount of short read data has been generated. If a reference genome is not available, the assembly of a template sequence is usually challenging because of repeats and the short length of reads. When NGS reads cannot be mapped onto a reference genome alignment-based methods are not applicable. However it is still possible to study the evolutionary relationship of unassembled genomes based on NGS data. RESULTS: We present a parameter-free alignment-free method, called [Formula: see text] , based on variable-length patterns, for the direct comparison of sets of NGS reads. We define a similarity measure using variable-length patterns, as well as reverses and reverse-complements, along with their statistical and syntactical properties. We evaluate several alignment-free statistics on the comparison of NGS reads coming from simulated and real genomes. In almost all simulations our method [Formula: see text] outperforms all other statistics. The performance gain becomes more evident when real genomes are used. CONCLUSION: The new alignment-free statistic is highly successful in discriminating related genomes based on NGS reads data. In almost all experiments, it outperforms traditional alignment-free statistics that are based on fixed length patterns. |
format | Online Article Text |
id | pubmed-4168702 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41687022014-10-02 Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns Comin, Matteo Schimd, Michele BMC Bioinformatics Proceedings BACKGROUND: With the advent of Next-Generation Sequencing technologies (NGS), a large amount of short read data has been generated. If a reference genome is not available, the assembly of a template sequence is usually challenging because of repeats and the short length of reads. When NGS reads cannot be mapped onto a reference genome alignment-based methods are not applicable. However it is still possible to study the evolutionary relationship of unassembled genomes based on NGS data. RESULTS: We present a parameter-free alignment-free method, called [Formula: see text] , based on variable-length patterns, for the direct comparison of sets of NGS reads. We define a similarity measure using variable-length patterns, as well as reverses and reverse-complements, along with their statistical and syntactical properties. We evaluate several alignment-free statistics on the comparison of NGS reads coming from simulated and real genomes. In almost all simulations our method [Formula: see text] outperforms all other statistics. The performance gain becomes more evident when real genomes are used. CONCLUSION: The new alignment-free statistic is highly successful in discriminating related genomes based on NGS reads data. In almost all experiments, it outperforms traditional alignment-free statistics that are based on fixed length patterns. BioMed Central 2014-09-10 /pmc/articles/PMC4168702/ /pubmed/25252700 http://dx.doi.org/10.1186/1471-2105-15-S9-S1 Text en Copyright © 2014 Comin and Schimd; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Comin, Matteo Schimd, Michele Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns |
title | Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns |
title_full | Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns |
title_fullStr | Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns |
title_full_unstemmed | Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns |
title_short | Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns |
title_sort | assembly-free genome comparison based on next-generation sequencing reads and variable length patterns |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168702/ https://www.ncbi.nlm.nih.gov/pubmed/25252700 http://dx.doi.org/10.1186/1471-2105-15-S9-S1 |
work_keys_str_mv | AT cominmatteo assemblyfreegenomecomparisonbasedonnextgenerationsequencingreadsandvariablelengthpatterns AT schimdmichele assemblyfreegenomecomparisonbasedonnextgenerationsequencingreadsandvariablelengthpatterns |