Cargando…
Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks
Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7911707/ https://www.ncbi.nlm.nih.gov/pubmed/33525314 http://dx.doi.org/10.3390/plants10020247 |
_version_ | 1783656405076541440 |
---|---|
author | Henao-Rojas, Juan Camilo Rosero-Alpala, María Gladis Ortiz-Muñoz, Carolina Velásquez-Arroyo, Carlos Enrique Leon-Rueda, William Alfonso Ramírez-Gil, Joaquín Guillermo |
author_facet | Henao-Rojas, Juan Camilo Rosero-Alpala, María Gladis Ortiz-Muñoz, Carolina Velásquez-Arroyo, Carlos Enrique Leon-Rueda, William Alfonso Ramírez-Gil, Joaquín Guillermo |
author_sort | Henao-Rojas, Juan Camilo |
collection | PubMed |
description | Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective of this research was to test and optimize different analysis methods based on ML for the prioritization and selection of morphological descriptors of Rubus spp. 55 descriptors were evaluated in 26 genotypes and the weight of each one and its ability to discriminating capacity was determined. ML methods as random forest (RF), support vector machines, in the linear and radial forms, and neural networks were optimized and compared. Subsequently, the results were validated with two discriminating methods and their variants: hierarchical agglomerative clustering and K-means. The results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance (p value < 0.05). Additionally, K-means method with optimized descriptors based on RF had greater discriminating power on Rubus spp., accessions according to evaluated statistics. This study presents one application of ML for the optimization of specific morphological variables for plant germplasm bank characterization. |
format | Online Article Text |
id | pubmed-7911707 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-79117072021-02-28 Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks Henao-Rojas, Juan Camilo Rosero-Alpala, María Gladis Ortiz-Muñoz, Carolina Velásquez-Arroyo, Carlos Enrique Leon-Rueda, William Alfonso Ramírez-Gil, Joaquín Guillermo Plants (Basel) Article Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective of this research was to test and optimize different analysis methods based on ML for the prioritization and selection of morphological descriptors of Rubus spp. 55 descriptors were evaluated in 26 genotypes and the weight of each one and its ability to discriminating capacity was determined. ML methods as random forest (RF), support vector machines, in the linear and radial forms, and neural networks were optimized and compared. Subsequently, the results were validated with two discriminating methods and their variants: hierarchical agglomerative clustering and K-means. The results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance (p value < 0.05). Additionally, K-means method with optimized descriptors based on RF had greater discriminating power on Rubus spp., accessions according to evaluated statistics. This study presents one application of ML for the optimization of specific morphological variables for plant germplasm bank characterization. MDPI 2021-01-28 /pmc/articles/PMC7911707/ /pubmed/33525314 http://dx.doi.org/10.3390/plants10020247 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Henao-Rojas, Juan Camilo Rosero-Alpala, María Gladis Ortiz-Muñoz, Carolina Velásquez-Arroyo, Carlos Enrique Leon-Rueda, William Alfonso Ramírez-Gil, Joaquín Guillermo Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks |
title | Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks |
title_full | Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks |
title_fullStr | Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks |
title_full_unstemmed | Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks |
title_short | Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks |
title_sort | machine learning applications and optimization of clustering methods improve the selection of descriptors in blackberry germplasm banks |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7911707/ https://www.ncbi.nlm.nih.gov/pubmed/33525314 http://dx.doi.org/10.3390/plants10020247 |
work_keys_str_mv | AT henaorojasjuancamilo machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks AT roseroalpalamariagladis machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks AT ortizmunozcarolina machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks AT velasquezarroyocarlosenrique machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks AT leonruedawilliamalfonso machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks AT ramirezgiljoaquinguillermo machinelearningapplicationsandoptimizationofclusteringmethodsimprovetheselectionofdescriptorsinblackberrygermplasmbanks |