Cargando…

Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means

Statistical analysis in infectious diseases is becoming more important, especially in prevention policy development. To achieve that, the epidemiology, a study of the relationship between the occurrence and who/when/where, is needed. In this paper, we develop the string grammar non-Euclidean relatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Budwong, Apiwat, Auephanwiriyakul, Sansanee, Theera-Umpon, Nipon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346127/
https://www.ncbi.nlm.nih.gov/pubmed/34360446
http://dx.doi.org/10.3390/ijerph18158153
_version_ 1783734797170900992
author Budwong, Apiwat
Auephanwiriyakul, Sansanee
Theera-Umpon, Nipon
author_facet Budwong, Apiwat
Auephanwiriyakul, Sansanee
Theera-Umpon, Nipon
author_sort Budwong, Apiwat
collection PubMed
description Statistical analysis in infectious diseases is becoming more important, especially in prevention policy development. To achieve that, the epidemiology, a study of the relationship between the occurrence and who/when/where, is needed. In this paper, we develop the string grammar non-Euclidean relational fuzzy C-means (sgNERF-CM) algorithm to determine a relationship inside the data from the age, career, and month viewpoint for all provinces in Thailand for the dengue fever, influenza, and Hepatitis B virus (HBV) infection. The Dunn’s index is used to select the best models because of its ability to identify the compact and well-separated clusters. We compare the results of the sgNERF-CM algorithm with the string grammar relational hard C-means (sgRHCM) algorithm. In addition, their numerical counterparts, i.e., relational hard C-means (RHCM) and non-Euclidean relational fuzzy C-means (NERF-CM) algorithms are also applied in the comparison. We found that the sgNERF-CM algorithm is far better than the numerical counterparts and better than the sgRHCM algorithm in most cases. From the results, we found that the month-based dataset does not help in relationship-finding since the diseases tend to happen all year round. People from different age ranges in different regions in Thailand have different numbers of dengue fever infections. The occupations that have a higher chance to have dengue fever are student and teacher groups from the central, north-east, north, and south regions. Additionally, students in all regions, except the central region, have a high risk of dengue infection. For the influenza dataset, we found that a group of people with the age of more than 1 year to 64 years old has higher number of influenza infections in every province. Most occupations in all regions have a higher risk of infecting the influenza. For the HBV dataset, people in all regions with an age between 10 to 65 years old have a high risk in infecting the disease. In addition, only farmer and general contractor groups in all regions have high chance of infecting HBV as well.
format Online
Article
Text
id pubmed-8346127
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83461272021-08-07 Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means Budwong, Apiwat Auephanwiriyakul, Sansanee Theera-Umpon, Nipon Int J Environ Res Public Health Article Statistical analysis in infectious diseases is becoming more important, especially in prevention policy development. To achieve that, the epidemiology, a study of the relationship between the occurrence and who/when/where, is needed. In this paper, we develop the string grammar non-Euclidean relational fuzzy C-means (sgNERF-CM) algorithm to determine a relationship inside the data from the age, career, and month viewpoint for all provinces in Thailand for the dengue fever, influenza, and Hepatitis B virus (HBV) infection. The Dunn’s index is used to select the best models because of its ability to identify the compact and well-separated clusters. We compare the results of the sgNERF-CM algorithm with the string grammar relational hard C-means (sgRHCM) algorithm. In addition, their numerical counterparts, i.e., relational hard C-means (RHCM) and non-Euclidean relational fuzzy C-means (NERF-CM) algorithms are also applied in the comparison. We found that the sgNERF-CM algorithm is far better than the numerical counterparts and better than the sgRHCM algorithm in most cases. From the results, we found that the month-based dataset does not help in relationship-finding since the diseases tend to happen all year round. People from different age ranges in different regions in Thailand have different numbers of dengue fever infections. The occupations that have a higher chance to have dengue fever are student and teacher groups from the central, north-east, north, and south regions. Additionally, students in all regions, except the central region, have a high risk of dengue infection. For the influenza dataset, we found that a group of people with the age of more than 1 year to 64 years old has higher number of influenza infections in every province. Most occupations in all regions have a higher risk of infecting the influenza. For the HBV dataset, people in all regions with an age between 10 to 65 years old have a high risk in infecting the disease. In addition, only farmer and general contractor groups in all regions have high chance of infecting HBV as well. MDPI 2021-08-01 /pmc/articles/PMC8346127/ /pubmed/34360446 http://dx.doi.org/10.3390/ijerph18158153 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Budwong, Apiwat
Auephanwiriyakul, Sansanee
Theera-Umpon, Nipon
Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means
title Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means
title_full Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means
title_fullStr Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means
title_full_unstemmed Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means
title_short Infectious Disease Relational Data Analysis Using String Grammar Non-Euclidean Relational Fuzzy C-Means
title_sort infectious disease relational data analysis using string grammar non-euclidean relational fuzzy c-means
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346127/
https://www.ncbi.nlm.nih.gov/pubmed/34360446
http://dx.doi.org/10.3390/ijerph18158153
work_keys_str_mv AT budwongapiwat infectiousdiseaserelationaldataanalysisusingstringgrammarnoneuclideanrelationalfuzzycmeans
AT auephanwiriyakulsansanee infectiousdiseaserelationaldataanalysisusingstringgrammarnoneuclideanrelationalfuzzycmeans
AT theeraumponnipon infectiousdiseaserelationaldataanalysisusingstringgrammarnoneuclideanrelationalfuzzycmeans