Cargando…

LAF: Logic Alignment Free and its application to bacterial genomes classification

Alignment-free algorithms can be used to estimate the similarity of biological sequences and hence are often applied to the phylogenetic reconstruction of genomes. Most of these algorithms rely on comparing the frequency of all the distinct substrings of fixed length (k-mers) that occur in the analy...

Descripción completa

Detalles Bibliográficos
Autores principales: Weitschek, Emanuel, Cunial, Fabio, Felici, Giovanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4673791/
https://www.ncbi.nlm.nih.gov/pubmed/26664519
http://dx.doi.org/10.1186/s13040-015-0073-1
_version_ 1782404809372991488
author Weitschek, Emanuel
Cunial, Fabio
Felici, Giovanni
author_facet Weitschek, Emanuel
Cunial, Fabio
Felici, Giovanni
author_sort Weitschek, Emanuel
collection PubMed
description Alignment-free algorithms can be used to estimate the similarity of biological sequences and hence are often applied to the phylogenetic reconstruction of genomes. Most of these algorithms rely on comparing the frequency of all the distinct substrings of fixed length (k-mers) that occur in the analyzed sequences. In this paper, we present Logic Alignment Free (LAF), a method that combines alignment-free techniques and rule-based classification algorithms in order to assign biological samples to their taxa. This method searches for a minimal subset of k-mers whose relative frequencies are used to build classification models as disjunctive-normal-form logic formulas (if-then rules). We apply LAF successfully to the classification of bacterial genomes to their corresponding taxonomy. In particular, we succeed in obtaining reliable classification at different taxonomic levels by extracting a handful of rules, each one based on the frequency of just few k-mers. State of the art methods to adjust the frequency of k-mers to the character distribution of the underlying genomes have negligible impact on classification performance, suggesting that the signal of each class is strong and that LAF is effective in identifying it.
format Online
Article
Text
id pubmed-4673791
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46737912015-12-10 LAF: Logic Alignment Free and its application to bacterial genomes classification Weitschek, Emanuel Cunial, Fabio Felici, Giovanni BioData Min Methodology Alignment-free algorithms can be used to estimate the similarity of biological sequences and hence are often applied to the phylogenetic reconstruction of genomes. Most of these algorithms rely on comparing the frequency of all the distinct substrings of fixed length (k-mers) that occur in the analyzed sequences. In this paper, we present Logic Alignment Free (LAF), a method that combines alignment-free techniques and rule-based classification algorithms in order to assign biological samples to their taxa. This method searches for a minimal subset of k-mers whose relative frequencies are used to build classification models as disjunctive-normal-form logic formulas (if-then rules). We apply LAF successfully to the classification of bacterial genomes to their corresponding taxonomy. In particular, we succeed in obtaining reliable classification at different taxonomic levels by extracting a handful of rules, each one based on the frequency of just few k-mers. State of the art methods to adjust the frequency of k-mers to the character distribution of the underlying genomes have negligible impact on classification performance, suggesting that the signal of each class is strong and that LAF is effective in identifying it. BioMed Central 2015-12-08 /pmc/articles/PMC4673791/ /pubmed/26664519 http://dx.doi.org/10.1186/s13040-015-0073-1 Text en © Weitschek et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Weitschek, Emanuel
Cunial, Fabio
Felici, Giovanni
LAF: Logic Alignment Free and its application to bacterial genomes classification
title LAF: Logic Alignment Free and its application to bacterial genomes classification
title_full LAF: Logic Alignment Free and its application to bacterial genomes classification
title_fullStr LAF: Logic Alignment Free and its application to bacterial genomes classification
title_full_unstemmed LAF: Logic Alignment Free and its application to bacterial genomes classification
title_short LAF: Logic Alignment Free and its application to bacterial genomes classification
title_sort laf: logic alignment free and its application to bacterial genomes classification
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4673791/
https://www.ncbi.nlm.nih.gov/pubmed/26664519
http://dx.doi.org/10.1186/s13040-015-0073-1
work_keys_str_mv AT weitschekemanuel laflogicalignmentfreeanditsapplicationtobacterialgenomesclassification
AT cunialfabio laflogicalignmentfreeanditsapplicationtobacterialgenomesclassification
AT felicigiovanni laflogicalignmentfreeanditsapplicationtobacterialgenomesclassification