Cargando…

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach

BACKGROUND: Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological dat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bej, Saptarshi, Sarkar, Jit, Biswas, Saikat, Mitra, Pabitra, Chakrabarti, Partha, Wolkenhauer, Olaf
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9142500/ https://www.ncbi.nlm.nih.gov/pubmed/35624098 http://dx.doi.org/10.1038/s41387-022-00206-2

_version_	1784715586049146880
author	Bej, Saptarshi Sarkar, Jit Biswas, Saikat Mitra, Pabitra Chakrabarti, Partha Wolkenhauer, Olaf
author_facet	Bej, Saptarshi Sarkar, Jit Biswas, Saikat Mitra, Pabitra Chakrabarti, Partha Wolkenhauer, Olaf
author_sort	Bej, Saptarshi
collection	PubMed
description	BACKGROUND: Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset from India containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients. METHODS: Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains diverse feature types. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data. RESULTS: Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters have lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising a non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods. CONCLUSIONS: From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. The application of UMAP-based clustering workflow for this type of dataset is novel in itself. Our findings demonstrate the presence of heterogeneity among Indian T2DM patients with regard to socio-demography and dietary patterns. From our analysis, we conclude that the existence of significant non-obese T2DM sub-populations characterized by younger age groups and economic disadvantage raises the need for different screening criteria for T2DM among rural Indian residents.
format	Online Article Text
id	pubmed-9142500
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-91425002022-05-29 Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach Bej, Saptarshi Sarkar, Jit Biswas, Saikat Mitra, Pabitra Chakrabarti, Partha Wolkenhauer, Olaf Nutr Diabetes Article BACKGROUND: Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset from India containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients. METHODS: Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains diverse feature types. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data. RESULTS: Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters have lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising a non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods. CONCLUSIONS: From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. The application of UMAP-based clustering workflow for this type of dataset is novel in itself. Our findings demonstrate the presence of heterogeneity among Indian T2DM patients with regard to socio-demography and dietary patterns. From our analysis, we conclude that the existence of significant non-obese T2DM sub-populations characterized by younger age groups and economic disadvantage raises the need for different screening criteria for T2DM among rural Indian residents. Nature Publishing Group UK 2022-05-27 /pmc/articles/PMC9142500/ /pubmed/35624098 http://dx.doi.org/10.1038/s41387-022-00206-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Bej, Saptarshi Sarkar, Jit Biswas, Saikat Mitra, Pabitra Chakrabarti, Partha Wolkenhauer, Olaf Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach
title	Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach
title_full	Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach
title_fullStr	Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach
title_full_unstemmed	Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach
title_short	Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach
title_sort	identification and epidemiological characterization of type-2 diabetes sub-population using an unsupervised machine learning approach
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9142500/ https://www.ncbi.nlm.nih.gov/pubmed/35624098 http://dx.doi.org/10.1038/s41387-022-00206-2
work_keys_str_mv	AT bejsaptarshi identificationandepidemiologicalcharacterizationoftype2diabetessubpopulationusinganunsupervisedmachinelearningapproach AT sarkarjit identificationandepidemiologicalcharacterizationoftype2diabetessubpopulationusinganunsupervisedmachinelearningapproach AT biswassaikat identificationandepidemiologicalcharacterizationoftype2diabetessubpopulationusinganunsupervisedmachinelearningapproach AT mitrapabitra identificationandepidemiologicalcharacterizationoftype2diabetessubpopulationusinganunsupervisedmachinelearningapproach AT chakrabartipartha identificationandepidemiologicalcharacterizationoftype2diabetessubpopulationusinganunsupervisedmachinelearningapproach AT wolkenhauerolaf identificationandepidemiologicalcharacterizationoftype2diabetessubpopulationusinganunsupervisedmachinelearningapproach

Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach

Ejemplares similares