Cargando…

funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model

BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For funga...

Descripción completa

Detalles Bibliográficos
Autores principales: Meher, Prabina Kumar, Sahu, Tanmaya Kumar, Gahoi, Shachi, Tomar, Ruchi, Rao, Atmakuri Ramakrishna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323839/
https://www.ncbi.nlm.nih.gov/pubmed/30616524
http://dx.doi.org/10.1186/s12863-018-0710-z
_version_ 1783385850623557632
author Meher, Prabina Kumar
Sahu, Tanmaya Kumar
Gahoi, Shachi
Tomar, Ruchi
Rao, Atmakuri Ramakrishna
author_facet Meher, Prabina Kumar
Sahu, Tanmaya Kumar
Gahoi, Shachi
Tomar, Ruchi
Rao, Atmakuri Ramakrishna
author_sort Meher, Prabina Kumar
collection PubMed
description BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS: A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS: An online prediction server “funbarRF” is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF (https://cran.r-project.org/web/packages/funbarRF/) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.
format Online
Article
Text
id pubmed-6323839
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63238392019-01-11 funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model Meher, Prabina Kumar Sahu, Tanmaya Kumar Gahoi, Shachi Tomar, Ruchi Rao, Atmakuri Ramakrishna BMC Genet Research Article BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS: A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS: An online prediction server “funbarRF” is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF (https://cran.r-project.org/web/packages/funbarRF/) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode. BioMed Central 2019-01-07 /pmc/articles/PMC6323839/ /pubmed/30616524 http://dx.doi.org/10.1186/s12863-018-0710-z Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Meher, Prabina Kumar
Sahu, Tanmaya Kumar
Gahoi, Shachi
Tomar, Ruchi
Rao, Atmakuri Ramakrishna
funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model
title funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model
title_full funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model
title_fullStr funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model
title_full_unstemmed funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model
title_short funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model
title_sort funbarrf: dna barcode-based fungal species prediction using multiclass random forest supervised learning model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323839/
https://www.ncbi.nlm.nih.gov/pubmed/30616524
http://dx.doi.org/10.1186/s12863-018-0710-z
work_keys_str_mv AT meherprabinakumar funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel
AT sahutanmayakumar funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel
AT gahoishachi funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel
AT tomarruchi funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel
AT raoatmakuriramakrishna funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel