Cargando…

Mycofier: a new machine learning-based classifier for fungal ITS sequences

BACKGROUND: The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental...

Descripción completa

Detalles Bibliográficos
Autores principales: Delgado-Serrano, Luisa, Restrepo, Silvia, Bustos, Jose Ricardo, Zambrano, Maria Mercedes, Anzola, Juan Manuel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982325/
https://www.ncbi.nlm.nih.gov/pubmed/27516337
http://dx.doi.org/10.1186/s13104-016-2203-3
_version_ 1782447764560412672
author Delgado-Serrano, Luisa
Restrepo, Silvia
Bustos, Jose Ricardo
Zambrano, Maria Mercedes
Anzola, Juan Manuel
author_facet Delgado-Serrano, Luisa
Restrepo, Silvia
Bustos, Jose Ricardo
Zambrano, Maria Mercedes
Anzola, Juan Manuel
author_sort Delgado-Serrano, Luisa
collection PubMed
description BACKGROUND: The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. RESULTS: A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. CONCLUSIONS: The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2203-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4982325
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49823252016-08-13 Mycofier: a new machine learning-based classifier for fungal ITS sequences Delgado-Serrano, Luisa Restrepo, Silvia Bustos, Jose Ricardo Zambrano, Maria Mercedes Anzola, Juan Manuel BMC Res Notes Research Article BACKGROUND: The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. RESULTS: A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. CONCLUSIONS: The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2203-3) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-11 /pmc/articles/PMC4982325/ /pubmed/27516337 http://dx.doi.org/10.1186/s13104-016-2203-3 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Delgado-Serrano, Luisa
Restrepo, Silvia
Bustos, Jose Ricardo
Zambrano, Maria Mercedes
Anzola, Juan Manuel
Mycofier: a new machine learning-based classifier for fungal ITS sequences
title Mycofier: a new machine learning-based classifier for fungal ITS sequences
title_full Mycofier: a new machine learning-based classifier for fungal ITS sequences
title_fullStr Mycofier: a new machine learning-based classifier for fungal ITS sequences
title_full_unstemmed Mycofier: a new machine learning-based classifier for fungal ITS sequences
title_short Mycofier: a new machine learning-based classifier for fungal ITS sequences
title_sort mycofier: a new machine learning-based classifier for fungal its sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4982325/
https://www.ncbi.nlm.nih.gov/pubmed/27516337
http://dx.doi.org/10.1186/s13104-016-2203-3
work_keys_str_mv AT delgadoserranoluisa mycofieranewmachinelearningbasedclassifierforfungalitssequences
AT restreposilvia mycofieranewmachinelearningbasedclassifierforfungalitssequences
AT bustosjosericardo mycofieranewmachinelearningbasedclassifierforfungalitssequences
AT zambranomariamercedes mycofieranewmachinelearningbasedclassifierforfungalitssequences
AT anzolajuanmanuel mycofieranewmachinelearningbasedclassifierforfungalitssequences