Cargando…

Implementation of machine learning in DNA barcoding for determining the plant family taxonomy

The DNA barcoding approach has been used extensively in taxonomy and phylogenetics. The differences in certain DNA sequences are able to differentiate and help classify organisms into taxa. It has been used in cases of taxonomic disputes where morphology by itself is insufficient. This research aime...

Descripción completa

Detalles Bibliográficos
Autores principales: Riza, Lala Septem, Zain, Muhammad Iqbal, Izzuddin, Ahmad, Prasetyo, Yudi, Hidayat, Topik, Abu Samah, Khyrina Airin Fariza
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10520734/
https://www.ncbi.nlm.nih.gov/pubmed/37767518
http://dx.doi.org/10.1016/j.heliyon.2023.e20161
_version_ 1785109986845655040
author Riza, Lala Septem
Zain, Muhammad Iqbal
Izzuddin, Ahmad
Prasetyo, Yudi
Hidayat, Topik
Abu Samah, Khyrina Airin Fariza
author_facet Riza, Lala Septem
Zain, Muhammad Iqbal
Izzuddin, Ahmad
Prasetyo, Yudi
Hidayat, Topik
Abu Samah, Khyrina Airin Fariza
author_sort Riza, Lala Septem
collection PubMed
description The DNA barcoding approach has been used extensively in taxonomy and phylogenetics. The differences in certain DNA sequences are able to differentiate and help classify organisms into taxa. It has been used in cases of taxonomic disputes where morphology by itself is insufficient. This research aimed to utilize hierarchical clustering, an unsupervised machine learning method, to determine and resolve disputes in plant family taxonomy. We take a case study of Leguminosae that historically some classify into three families (Fabaceae, Caesalpiniaceae, and Mimosaceae) but others classify into one family (Leguminosae). This study is divided into several phases, which are: (i) data collection, (ii) data preprocessing, (iii) finding the best distance method, and (iv) determining disputed family. The data used are collected from several sources, including National Center for Biotechnology Information (NCBI), journals, and websites. The data for validation of the methods were collected from NCBI. This was used to determine the best distance method for differentiating families or genera. The data for the case study in the Leguminosae group was collected from journals and a website. From the experiment that we have conducted, we found that the Pearson method is the best distance method to do clustering ITS sequence of plants, both in accuracy and computational cost. We use the Pearson method to determine the disputed family between Leguminosae. We found that the case study of Leguminosae should be grouped into one family based on our research.
format Online
Article
Text
id pubmed-10520734
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-105207342023-09-27 Implementation of machine learning in DNA barcoding for determining the plant family taxonomy Riza, Lala Septem Zain, Muhammad Iqbal Izzuddin, Ahmad Prasetyo, Yudi Hidayat, Topik Abu Samah, Khyrina Airin Fariza Heliyon Research Article The DNA barcoding approach has been used extensively in taxonomy and phylogenetics. The differences in certain DNA sequences are able to differentiate and help classify organisms into taxa. It has been used in cases of taxonomic disputes where morphology by itself is insufficient. This research aimed to utilize hierarchical clustering, an unsupervised machine learning method, to determine and resolve disputes in plant family taxonomy. We take a case study of Leguminosae that historically some classify into three families (Fabaceae, Caesalpiniaceae, and Mimosaceae) but others classify into one family (Leguminosae). This study is divided into several phases, which are: (i) data collection, (ii) data preprocessing, (iii) finding the best distance method, and (iv) determining disputed family. The data used are collected from several sources, including National Center for Biotechnology Information (NCBI), journals, and websites. The data for validation of the methods were collected from NCBI. This was used to determine the best distance method for differentiating families or genera. The data for the case study in the Leguminosae group was collected from journals and a website. From the experiment that we have conducted, we found that the Pearson method is the best distance method to do clustering ITS sequence of plants, both in accuracy and computational cost. We use the Pearson method to determine the disputed family between Leguminosae. We found that the case study of Leguminosae should be grouped into one family based on our research. Elsevier 2023-09-21 /pmc/articles/PMC10520734/ /pubmed/37767518 http://dx.doi.org/10.1016/j.heliyon.2023.e20161 Text en © 2023 The Authors. Published by Elsevier Ltd. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Riza, Lala Septem
Zain, Muhammad Iqbal
Izzuddin, Ahmad
Prasetyo, Yudi
Hidayat, Topik
Abu Samah, Khyrina Airin Fariza
Implementation of machine learning in DNA barcoding for determining the plant family taxonomy
title Implementation of machine learning in DNA barcoding for determining the plant family taxonomy
title_full Implementation of machine learning in DNA barcoding for determining the plant family taxonomy
title_fullStr Implementation of machine learning in DNA barcoding for determining the plant family taxonomy
title_full_unstemmed Implementation of machine learning in DNA barcoding for determining the plant family taxonomy
title_short Implementation of machine learning in DNA barcoding for determining the plant family taxonomy
title_sort implementation of machine learning in dna barcoding for determining the plant family taxonomy
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10520734/
https://www.ncbi.nlm.nih.gov/pubmed/37767518
http://dx.doi.org/10.1016/j.heliyon.2023.e20161
work_keys_str_mv AT rizalalaseptem implementationofmachinelearningindnabarcodingfordeterminingtheplantfamilytaxonomy
AT zainmuhammadiqbal implementationofmachinelearningindnabarcodingfordeterminingtheplantfamilytaxonomy
AT izzuddinahmad implementationofmachinelearningindnabarcodingfordeterminingtheplantfamilytaxonomy
AT prasetyoyudi implementationofmachinelearningindnabarcodingfordeterminingtheplantfamilytaxonomy
AT hidayattopik implementationofmachinelearningindnabarcodingfordeterminingtheplantfamilytaxonomy
AT abusamahkhyrinaairinfariza implementationofmachinelearningindnabarcodingfordeterminingtheplantfamilytaxonomy