Cargando…
Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification
Arabia is the largest peninsula in the world, with >3000 species of vascular plants. Not much effort has been made to generate a multi-locus marker barcode library to identify and discriminate the recorded plant species. This study aimed to determine the reliability of the available Arabian plant...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8708657/ https://www.ncbi.nlm.nih.gov/pubmed/34961211 http://dx.doi.org/10.3390/plants10122741 |
_version_ | 1784622740501692416 |
---|---|
author | Jamdade, Rahul Upadhyay, Maulik Al Shaer, Khawla Al Harthi, Eman Al Sallani, Mariam Al Jasmi, Mariam Al Ketbi, Asma |
author_facet | Jamdade, Rahul Upadhyay, Maulik Al Shaer, Khawla Al Harthi, Eman Al Sallani, Mariam Al Jasmi, Mariam Al Ketbi, Asma |
author_sort | Jamdade, Rahul |
collection | PubMed |
description | Arabia is the largest peninsula in the world, with >3000 species of vascular plants. Not much effort has been made to generate a multi-locus marker barcode library to identify and discriminate the recorded plant species. This study aimed to determine the reliability of the available Arabian plant barcodes (>1500; rbcL and matK) at the public repository (NCBI GenBank) using the unsupervised and supervised methods. Comparative analysis was carried out with the standard dataset (FINBOL) to assess the methods and markers’ reliability. Our analysis suggests that from the unsupervised method, TaxonDNA’s All Species Barcode criterion (ASB) exhibits the highest accuracy for rbcL barcodes, followed by the matK barcodes using the aligned dataset (FINBOL). However, for the Arabian plant barcode dataset (GBMA), the supervised method performed better than the unsupervised method, where the Random Forest and K-Nearest Neighbor (gappy kernel) classifiers were robust enough. These classifiers successfully recognized true species from both barcode markers belonging to the aligned and alignment-free datasets, respectively. The multi-class classifier showed high species resolution following the two classifiers, though its performance declined when employed to recognize true species. Similar results were observed for the FINBOL dataset through the supervised learning approach; overall, matK marker showed higher accuracy than rbcL. However, the lower rate of species identification in matK in GBMA data could be due to the higher evolutionary rate or gaps and missing data, as observed for the ASB criterion in the FINBOL dataset. Further, a lower number of sequences and singletons could also affect the rate of species resolution, as observed in the GBMA dataset. The GBMA dataset lacks sufficient species membership. We would encourage the taxonomists from the Arabian Peninsula to join our campaign on the Arabian Barcode of Life at the Barcode of Life Data (BOLD) systems. Our efforts together could help improve the rate of species identification for the Arabian Vascular plants. |
format | Online Article Text |
id | pubmed-8708657 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-87086572021-12-25 Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification Jamdade, Rahul Upadhyay, Maulik Al Shaer, Khawla Al Harthi, Eman Al Sallani, Mariam Al Jasmi, Mariam Al Ketbi, Asma Plants (Basel) Article Arabia is the largest peninsula in the world, with >3000 species of vascular plants. Not much effort has been made to generate a multi-locus marker barcode library to identify and discriminate the recorded plant species. This study aimed to determine the reliability of the available Arabian plant barcodes (>1500; rbcL and matK) at the public repository (NCBI GenBank) using the unsupervised and supervised methods. Comparative analysis was carried out with the standard dataset (FINBOL) to assess the methods and markers’ reliability. Our analysis suggests that from the unsupervised method, TaxonDNA’s All Species Barcode criterion (ASB) exhibits the highest accuracy for rbcL barcodes, followed by the matK barcodes using the aligned dataset (FINBOL). However, for the Arabian plant barcode dataset (GBMA), the supervised method performed better than the unsupervised method, where the Random Forest and K-Nearest Neighbor (gappy kernel) classifiers were robust enough. These classifiers successfully recognized true species from both barcode markers belonging to the aligned and alignment-free datasets, respectively. The multi-class classifier showed high species resolution following the two classifiers, though its performance declined when employed to recognize true species. Similar results were observed for the FINBOL dataset through the supervised learning approach; overall, matK marker showed higher accuracy than rbcL. However, the lower rate of species identification in matK in GBMA data could be due to the higher evolutionary rate or gaps and missing data, as observed for the ASB criterion in the FINBOL dataset. Further, a lower number of sequences and singletons could also affect the rate of species resolution, as observed in the GBMA dataset. The GBMA dataset lacks sufficient species membership. We would encourage the taxonomists from the Arabian Peninsula to join our campaign on the Arabian Barcode of Life at the Barcode of Life Data (BOLD) systems. Our efforts together could help improve the rate of species identification for the Arabian Vascular plants. MDPI 2021-12-13 /pmc/articles/PMC8708657/ /pubmed/34961211 http://dx.doi.org/10.3390/plants10122741 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Jamdade, Rahul Upadhyay, Maulik Al Shaer, Khawla Al Harthi, Eman Al Sallani, Mariam Al Jasmi, Mariam Al Ketbi, Asma Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification |
title | Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification |
title_full | Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification |
title_fullStr | Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification |
title_full_unstemmed | Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification |
title_short | Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification |
title_sort | evaluation of arabian vascular plant barcodes (rbcl and matk): precision of unsupervised and supervised learning methods towards accurate identification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8708657/ https://www.ncbi.nlm.nih.gov/pubmed/34961211 http://dx.doi.org/10.3390/plants10122741 |
work_keys_str_mv | AT jamdaderahul evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification AT upadhyaymaulik evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification AT alshaerkhawla evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification AT alharthieman evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification AT alsallanimariam evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification AT aljasmimariam evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification AT alketbiasma evaluationofarabianvascularplantbarcodesrbclandmatkprecisionofunsupervisedandsupervisedlearningmethodstowardsaccurateidentification |