Cargando…
funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model
BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For funga...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323839/ https://www.ncbi.nlm.nih.gov/pubmed/30616524 http://dx.doi.org/10.1186/s12863-018-0710-z |
_version_ | 1783385850623557632 |
---|---|
author | Meher, Prabina Kumar Sahu, Tanmaya Kumar Gahoi, Shachi Tomar, Ruchi Rao, Atmakuri Ramakrishna |
author_facet | Meher, Prabina Kumar Sahu, Tanmaya Kumar Gahoi, Shachi Tomar, Ruchi Rao, Atmakuri Ramakrishna |
author_sort | Meher, Prabina Kumar |
collection | PubMed |
description | BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS: A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS: An online prediction server “funbarRF” is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF (https://cran.r-project.org/web/packages/funbarRF/) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode. |
format | Online Article Text |
id | pubmed-6323839 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63238392019-01-11 funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model Meher, Prabina Kumar Sahu, Tanmaya Kumar Gahoi, Shachi Tomar, Ruchi Rao, Atmakuri Ramakrishna BMC Genet Research Article BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS: A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS: An online prediction server “funbarRF” is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF (https://cran.r-project.org/web/packages/funbarRF/) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode. BioMed Central 2019-01-07 /pmc/articles/PMC6323839/ /pubmed/30616524 http://dx.doi.org/10.1186/s12863-018-0710-z Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Meher, Prabina Kumar Sahu, Tanmaya Kumar Gahoi, Shachi Tomar, Ruchi Rao, Atmakuri Ramakrishna funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model |
title | funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model |
title_full | funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model |
title_fullStr | funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model |
title_full_unstemmed | funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model |
title_short | funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model |
title_sort | funbarrf: dna barcode-based fungal species prediction using multiclass random forest supervised learning model |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323839/ https://www.ncbi.nlm.nih.gov/pubmed/30616524 http://dx.doi.org/10.1186/s12863-018-0710-z |
work_keys_str_mv | AT meherprabinakumar funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel AT sahutanmayakumar funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel AT gahoishachi funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel AT tomarruchi funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel AT raoatmakuriramakrishna funbarrfdnabarcodebasedfungalspeciespredictionusingmulticlassrandomforestsupervisedlearningmodel |