Cargando…

Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers

The heterogeneity of smallholder dairy production systems complicates service provision, information sharing, and dissemination of new technologies, especially those needed to maximize productivity and profitability. In order to obtain homogenous groups within which interventions can be made, it is...

Descripción completa

Detalles Bibliográficos
Autores principales: Nyambo, Devotha G., Luhanga, Edith T., Yonah, Zaipuna O., Mujibi, Fidalis D. N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334318/
https://www.ncbi.nlm.nih.gov/pubmed/30718979
http://dx.doi.org/10.1155/2019/1020521
_version_ 1783387686434766848
author Nyambo, Devotha G.
Luhanga, Edith T.
Yonah, Zaipuna O.
Mujibi, Fidalis D. N.
author_facet Nyambo, Devotha G.
Luhanga, Edith T.
Yonah, Zaipuna O.
Mujibi, Fidalis D. N.
author_sort Nyambo, Devotha G.
collection PubMed
description The heterogeneity of smallholder dairy production systems complicates service provision, information sharing, and dissemination of new technologies, especially those needed to maximize productivity and profitability. In order to obtain homogenous groups within which interventions can be made, it is necessary to define clusters of farmers who undertake similar management activities. This paper explores robustness of production cluster definition using various unsupervised learning algorithms to assess the best approach to define clusters. Data were collected from 8179 smallholder dairy farms in Ethiopia and Tanzania. From a total of 500 variables, selection of the 35 variables used in defining production clusters and household membership to these clusters was determined by Principal Component Analysis and domain expert knowledge. Three clustering algorithms, K-means, fuzzy, and Self-Organizing Maps (SOM), were compared in terms of their grouping consistency and prediction accuracy. The model with the least household reallocation between clusters for training and testing data was deemed the most robust. Prediction accuracy was obtained by fitting a model with fixed effects model including production clusters on milk yield, sales, and choice of breeding method. Results indicated that, for the Ethiopian dataset, clusters derived from the fuzzy algorithm had the highest predictive power (77% for milk yield and 48% for milk sales), while for the Tanzania data, clusters derived from Self-Organizing Maps were the best performing. The average cluster membership reallocation was 15%, 12%, and 34% for K-means, SOM, and fuzzy, respectively, for households in Ethiopia. Based on the divergent performance of the various algorithms evaluated, it is evident that, despite similar information being available for the study populations, the uniqueness of the data from each country provided an over-riding influence on cluster robustness and prediction accuracy. The results obtained in this study demonstrate the difficulty of generalizing model application and use across countries and production systems, despite seemingly similar information being collected.
format Online
Article
Text
id pubmed-6334318
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-63343182019-02-04 Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers Nyambo, Devotha G. Luhanga, Edith T. Yonah, Zaipuna O. Mujibi, Fidalis D. N. ScientificWorldJournal Research Article The heterogeneity of smallholder dairy production systems complicates service provision, information sharing, and dissemination of new technologies, especially those needed to maximize productivity and profitability. In order to obtain homogenous groups within which interventions can be made, it is necessary to define clusters of farmers who undertake similar management activities. This paper explores robustness of production cluster definition using various unsupervised learning algorithms to assess the best approach to define clusters. Data were collected from 8179 smallholder dairy farms in Ethiopia and Tanzania. From a total of 500 variables, selection of the 35 variables used in defining production clusters and household membership to these clusters was determined by Principal Component Analysis and domain expert knowledge. Three clustering algorithms, K-means, fuzzy, and Self-Organizing Maps (SOM), were compared in terms of their grouping consistency and prediction accuracy. The model with the least household reallocation between clusters for training and testing data was deemed the most robust. Prediction accuracy was obtained by fitting a model with fixed effects model including production clusters on milk yield, sales, and choice of breeding method. Results indicated that, for the Ethiopian dataset, clusters derived from the fuzzy algorithm had the highest predictive power (77% for milk yield and 48% for milk sales), while for the Tanzania data, clusters derived from Self-Organizing Maps were the best performing. The average cluster membership reallocation was 15%, 12%, and 34% for K-means, SOM, and fuzzy, respectively, for households in Ethiopia. Based on the divergent performance of the various algorithms evaluated, it is evident that, despite similar information being available for the study populations, the uniqueness of the data from each country provided an over-riding influence on cluster robustness and prediction accuracy. The results obtained in this study demonstrate the difficulty of generalizing model application and use across countries and production systems, despite seemingly similar information being collected. Hindawi 2019-01-02 /pmc/articles/PMC6334318/ /pubmed/30718979 http://dx.doi.org/10.1155/2019/1020521 Text en Copyright © 2019 Devotha G. Nyambo et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nyambo, Devotha G.
Luhanga, Edith T.
Yonah, Zaipuna O.
Mujibi, Fidalis D. N.
Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers
title Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers
title_full Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers
title_fullStr Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers
title_full_unstemmed Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers
title_short Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers
title_sort application of multiple unsupervised models to validate clusters robustness in characterizing smallholder dairy farmers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334318/
https://www.ncbi.nlm.nih.gov/pubmed/30718979
http://dx.doi.org/10.1155/2019/1020521
work_keys_str_mv AT nyambodevothag applicationofmultipleunsupervisedmodelstovalidateclustersrobustnessincharacterizingsmallholderdairyfarmers
AT luhangaeditht applicationofmultipleunsupervisedmodelstovalidateclustersrobustnessincharacterizingsmallholderdairyfarmers
AT yonahzaipunao applicationofmultipleunsupervisedmodelstovalidateclustersrobustnessincharacterizingsmallholderdairyfarmers
AT mujibifidalisdn applicationofmultipleunsupervisedmodelstovalidateclustersrobustnessincharacterizingsmallholderdairyfarmers