Cargando…

Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification

Arabia is the largest peninsula in the world, with >3000 species of vascular plants. Not much effort has been made to generate a multi-locus marker barcode library to identify and discriminate the recorded plant species. This study aimed to determine the reliability of the available Arabian plant...

Descripción completa

Detalles Bibliográficos
Autores principales: Jamdade, Rahul, Upadhyay, Maulik, Al Shaer, Khawla, Al Harthi, Eman, Al Sallani, Mariam, Al Jasmi, Mariam, Al Ketbi, Asma
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8708657/
https://www.ncbi.nlm.nih.gov/pubmed/34961211
http://dx.doi.org/10.3390/plants10122741
_version_ 1784622740501692416
author Jamdade, Rahul
Upadhyay, Maulik
Al Shaer, Khawla
Al Harthi, Eman
Al Sallani, Mariam
Al Jasmi, Mariam
Al Ketbi, Asma
author_facet Jamdade, Rahul
Upadhyay, Maulik
Al Shaer, Khawla
Al Harthi, Eman
Al Sallani, Mariam
Al Jasmi, Mariam
Al Ketbi, Asma
author_sort Jamdade, Rahul
collection PubMed
description Arabia is the largest peninsula in the world, with >3000 species of vascular plants. Not much effort has been made to generate a multi-locus marker barcode library to identify and discriminate the recorded plant species. This study aimed to determine the reliability of the available Arabian plant barcodes (>1500; rbcL and matK) at the public repository (NCBI GenBank) using the unsupervised and supervised methods. Comparative analysis was carried out with the standard dataset (FINBOL) to assess the methods and markers’ reliability. Our analysis suggests that from the unsupervised method, TaxonDNA’s All Species Barcode criterion (ASB) exhibits the highest accuracy for rbcL barcodes, followed by the matK barcodes using the aligned dataset (FINBOL). However, for the Arabian plant barcode dataset (GBMA), the supervised method performed better than the unsupervised method, where the Random Forest and K-Nearest Neighbor (gappy kernel) classifiers were robust enough. These classifiers successfully recognized true species from both barcode markers belonging to the aligned and alignment-free datasets, respectively. The multi-class classifier showed high species resolution following the two classifiers, though its performance declined when employed to recognize true species. Similar results were observed for the FINBOL dataset through the supervised learning approach; overall, matK marker showed higher accuracy than rbcL. However, the lower rate of species identification in matK in GBMA data could be due to the higher evolutionary rate or gaps and missing data, as observed for the ASB criterion in the FINBOL dataset. Further, a lower number of sequences and singletons could also affect the rate of species resolution, as observed in the GBMA dataset. The GBMA dataset lacks sufficient species membership. We would encourage the taxonomists from the Arabian Peninsula to join our campaign on the Arabian Barcode of Life at the Barcode of Life Data (BOLD) systems. Our efforts together could help improve the rate of species identification for the Arabian Vascular plants.
format Online
Article
Text
id pubmed-8708657
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87086572021-12-25 Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification Jamdade, Rahul Upadhyay, Maulik Al Shaer, Khawla Al Harthi, Eman Al Sallani, Mariam Al Jasmi, Mariam Al Ketbi, Asma Plants (Basel) Article Arabia is the largest peninsula in the world, with >3000 species of vascular plants. Not much effort has been made to generate a multi-locus marker barcode library to identify and discriminate the recorded plant species. This study aimed to determine the reliability of the available Arabian plant barcodes (>1500; rbcL and matK) at the public repository (NCBI GenBank) using the unsupervised and supervised methods. Comparative analysis was carried out with the standard dataset (FINBOL) to assess the methods and markers’ reliability. Our analysis suggests that from the unsupervised method, TaxonDNA’s All Species Barcode criterion (ASB) exhibits the highest accuracy for rbcL barcodes, followed by the matK barcodes using the aligned dataset (FINBOL). However, for the Arabian plant barcode dataset (GBMA), the supervised method performed better than the unsupervised method, where the Random Forest and K-Nearest Neighbor (gappy kernel) classifiers were robust enough. These classifiers successfully recognized true species from both barcode markers belonging to the aligned and alignment-free datasets, respectively. The multi-class classifier showed high species resolution following the two classifiers, though its performance declined when employed to recognize true species. Similar results were observed for the FINBOL dataset through the supervised learning approach; overall, matK marker showed higher accuracy than rbcL. However, the lower rate of species identification in matK in GBMA data could be due to the higher evolutionary rate or gaps and missing data, as observed for the ASB criterion in the FINBOL dataset. Further, a lower number of sequences and singletons could also affect the rate of species resolution, as observed in the GBMA dataset. The GBMA dataset lacks sufficient species membership. We would encourage the taxonomists from the Arabian Peninsula to join our campaign on the Arabian Barcode of Life at the Barcode of Life Data (BOLD) systems. Our efforts together could help improve the rate of species identification for the Arabian Vascular plants. MDPI 2021-12-13 /pmc/articles/PMC8708657/ /pubmed/34961211 http://dx.doi.org/10.3390/plants10122741 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Jamdade, Rahul
Upadhyay, Maulik
Al Shaer, Khawla
Al Harthi, Eman
Al Sallani, Mariam
Al Jasmi, Mariam
Al Ketbi, Asma
Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification
title Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification
title_full Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification
title_fullStr Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification
title_full_unstemmed Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification
title_short Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification
title_sort evaluation of arabian vascular plant barcodes (rbcl and matk): precision of unsupervised and supervised learning methods towards accurate identification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8708657/
https://www.ncbi.nlm.nih.gov/pubmed/34961211
http://dx.doi.org/10.3390/plants10122741
work_keys_str_mv AT jamdaderahul evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification
AT upadhyaymaulik evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification
AT alshaerkhawla evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification
AT alharthieman evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification
AT alsallanimariam evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification
AT aljasmimariam evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification
AT alketbiasma evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification