Cargando…

Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks

Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective...

Descripción completa

Detalles Bibliográficos
Autores principales: Henao-Rojas, Juan Camilo, Rosero-Alpala, María Gladis, Ortiz-Muñoz, Carolina, Velásquez-Arroyo, Carlos Enrique, Leon-Rueda, William Alfonso, Ramírez-Gil, Joaquín Guillermo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7911707/
https://www.ncbi.nlm.nih.gov/pubmed/33525314
http://dx.doi.org/10.3390/plants10020247
_version_ 1783656405076541440
author Henao-Rojas, Juan Camilo
Rosero-Alpala, María Gladis
Ortiz-Muñoz, Carolina
Velásquez-Arroyo, Carlos Enrique
Leon-Rueda, William Alfonso
Ramírez-Gil, Joaquín Guillermo
author_facet Henao-Rojas, Juan Camilo
Rosero-Alpala, María Gladis
Ortiz-Muñoz, Carolina
Velásquez-Arroyo, Carlos Enrique
Leon-Rueda, William Alfonso
Ramírez-Gil, Joaquín Guillermo
author_sort Henao-Rojas, Juan Camilo
collection PubMed
description Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective of this research was to test and optimize different analysis methods based on ML for the prioritization and selection of morphological descriptors of Rubus spp. 55 descriptors were evaluated in 26 genotypes and the weight of each one and its ability to discriminating capacity was determined. ML methods as random forest (RF), support vector machines, in the linear and radial forms, and neural networks were optimized and compared. Subsequently, the results were validated with two discriminating methods and their variants: hierarchical agglomerative clustering and K-means. The results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance (p value < 0.05). Additionally, K-means method with optimized descriptors based on RF had greater discriminating power on Rubus spp., accessions according to evaluated statistics. This study presents one application of ML for the optimization of specific morphological variables for plant germplasm bank characterization.
format Online
Article
Text
id pubmed-7911707
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79117072021-02-28 Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks Henao-Rojas, Juan Camilo Rosero-Alpala, María Gladis Ortiz-Muñoz, Carolina Velásquez-Arroyo, Carlos Enrique Leon-Rueda, William Alfonso Ramírez-Gil, Joaquín Guillermo Plants (Basel) Article Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective of this research was to test and optimize different analysis methods based on ML for the prioritization and selection of morphological descriptors of Rubus spp. 55 descriptors were evaluated in 26 genotypes and the weight of each one and its ability to discriminating capacity was determined. ML methods as random forest (RF), support vector machines, in the linear and radial forms, and neural networks were optimized and compared. Subsequently, the results were validated with two discriminating methods and their variants: hierarchical agglomerative clustering and K-means. The results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance (p value < 0.05). Additionally, K-means method with optimized descriptors based on RF had greater discriminating power on Rubus spp., accessions according to evaluated statistics. This study presents one application of ML for the optimization of specific morphological variables for plant germplasm bank characterization. MDPI 2021-01-28 /pmc/articles/PMC7911707/ /pubmed/33525314 http://dx.doi.org/10.3390/plants10020247 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Henao-Rojas, Juan Camilo
Rosero-Alpala, María Gladis
Ortiz-Muñoz, Carolina
Velásquez-Arroyo, Carlos Enrique
Leon-Rueda, William Alfonso
Ramírez-Gil, Joaquín Guillermo
Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks
title Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks
title_full Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks
title_fullStr Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks
title_full_unstemmed Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks
title_short Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks
title_sort machine learning applications and optimization of clustering methods improve the selection of descriptors in blackberry germplasm banks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7911707/
https://www.ncbi.nlm.nih.gov/pubmed/33525314
http://dx.doi.org/10.3390/plants10020247
work_keys_str_mv AT henaorojasjuancamilo machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks
AT roseroalpalamariagladis machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks
AT ortizmunozcarolina machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks
AT velasquezarroyocarlosenrique machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks
AT leonruedawilliamalfonso machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks
AT ramirezgiljoaquinguillermo machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks