Cargando…
CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification
Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and c...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048311/ https://www.ncbi.nlm.nih.gov/pubmed/36980906 http://dx.doi.org/10.3390/genes14030634 |
_version_ | 1785014151897153536 |
---|---|
author | Das, Ritwika Rai, Anil Mishra, Dwijesh Chandra |
author_facet | Das, Ritwika Rai, Anil Mishra, Dwijesh Chandra |
author_sort | Das, Ritwika |
collection | PubMed |
description | Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, k-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets. |
format | Online Article Text |
id | pubmed-10048311 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100483112023-03-29 CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification Das, Ritwika Rai, Anil Mishra, Dwijesh Chandra Genes (Basel) Article Fungal species identification from metagenomic data is a highly challenging task. Internal Transcribed Spacer (ITS) region is a potential DNA marker for fungi taxonomy prediction. Computational approaches, especially deep learning algorithms, are highly efficient for better pattern recognition and classification of large datasets compared to in silico techniques such as BLAST and machine learning methods. Here in this study, we present CNN_FunBar, a convolutional neural network-based approach for the classification of fungi ITS sequences from UNITE+INSDC reference datasets. Effects of convolution kernel size, filter numbers, k-mer size, degree of diversity and category-wise frequency of ITS sequences on classification performances of CNN models have been assessed at all taxonomic levels (species, genus, family, order, class and phylum). It is observed that CNN models can produce >93% average accuracy for classifying ITS sequences from balanced datasets with 500 sequences per category and 6-mer frequency features at all levels. The comparative study has revealed that CNN_FunBar can outperform machine learning-based algorithms (SVM, KNN, Naïve-Bayes and Random Forest) as well as existing fungal taxonomy prediction software (funbarRF, Mothur, RDP Classifier and SINTAX). The present study will be helpful for fungal taxonomy classification using large metagenomic datasets. MDPI 2023-03-03 /pmc/articles/PMC10048311/ /pubmed/36980906 http://dx.doi.org/10.3390/genes14030634 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Das, Ritwika Rai, Anil Mishra, Dwijesh Chandra CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_full | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_fullStr | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_full_unstemmed | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_short | CNN_FunBar: Advanced Learning Technique for Fungi ITS Region Classification |
title_sort | cnn_funbar: advanced learning technique for fungi its region classification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10048311/ https://www.ncbi.nlm.nih.gov/pubmed/36980906 http://dx.doi.org/10.3390/genes14030634 |
work_keys_str_mv | AT dasritwika cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification AT raianil cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification AT mishradwijeshchandra cnnfunbaradvancedlearningtechniqueforfungiitsregionclassification |