Cargando…

CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification

Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and c...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Ritwika, Rai, Anil, Mishra, Dwijesh Chandra
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048311/
https://www.ncbi.nlm.nih.gov/pubmed/36980906
http://dx.doi.org/10.3390/genes14030634
_version_ 1785014151897153536
author Das, Ritwika
Rai, Anil
Mishra, Dwijesh Chandra
author_facet Das, Ritwika
Rai, Anil
Mishra, Dwijesh Chandra
author_sort Das, Ritwika
collection PubMed
description Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, k-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets.
format Online
Article
Text
id pubmed-10048311
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100483112023-03-29 CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification Das, Ritwika Rai, Anil Mishra, Dwijesh Chandra Genes (Basel) Article Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, k-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets. MDPI 2023-03-03 /pmc/articles/PMC10048311/ /pubmed/36980906 http://dx.doi.org/10.3390/genes14030634 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Das, Ritwika
Rai, Anil
Mishra, Dwijesh Chandra
CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_full CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_fullStr CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_full_unstemmed CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_short CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
title_sort cnn_funbar: advanced learning technique for fungi its region classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048311/
https://www.ncbi.nlm.nih.gov/pubmed/36980906
http://dx.doi.org/10.3390/genes14030634
work_keys_str_mv AT dasritwika cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification
AT raianil cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification
AT mishradwijeshchandra cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification