Cargando…

Predicting Writing Styles of Web-Based Materials for Children’s Health Education Using the Selection of Semantic Features: Machine Learning Approach

BACKGROUND: Medical writing styles can have an impact on the understandability of health educational resources. Amid current web-based health information research, there is a dearth of research-based evidence that demonstrates what constitutes the best practice of the development of web-based health...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xie, Wenxiu, Ji, Meng, Liu, Yanmeng, Hao, Tianyong, Chow, Chi-Yin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8367110/ https://www.ncbi.nlm.nih.gov/pubmed/34292167 http://dx.doi.org/10.2196/30115

_version_	1783739011710320640
author	Xie, Wenxiu Ji, Meng Liu, Yanmeng Hao, Tianyong Chow, Chi-Yin
author_facet	Xie, Wenxiu Ji, Meng Liu, Yanmeng Hao, Tianyong Chow, Chi-Yin
author_sort	Xie, Wenxiu
collection	PubMed
description	BACKGROUND: Medical writing styles can have an impact on the understandability of health educational resources. Amid current web-based health information research, there is a dearth of research-based evidence that demonstrates what constitutes the best practice of the development of web-based health resources on children’s health promotion and education. OBJECTIVE: Using authoritative and highly influential web-based children’s health educational resources from the Nemours Foundation, the largest not-for-profit organization promoting children’s health and well-being, we aimed to develop machine learning algorithms to discriminate and predict the writing styles of health educational resources on children versus adult health promotion using a variety of health educational resources aimed at the general public. METHODS: The selection of natural language features as predicator variables of algorithms went through initial automatic feature selection using ridge classifier, support vector machine, extreme gradient boost tree, and recursive feature elimination followed by revision by education experts. We compared algorithms using the automatically selected (n=19) and linguistically enhanced (n=20) feature sets, using the initial feature set (n=115) as the baseline. RESULTS: Using five-fold cross-validation, compared with the baseline (115 features), the Gaussian Naive Bayes model (20 features) achieved statistically higher mean sensitivity (P=.02; 95% CI −0.016 to 0.1929), mean specificity (P=.02; 95% CI −0.016 to 0.199), mean area under the receiver operating characteristic curve (P=.02; 95% CI −0.007 to 0.140), and mean macro F1 (P=.006; 95% CI 0.016-0.167). The statistically improved performance of the final model (20 features) is in contrast to the statistically insignificant changes between the original feature set (n=115) and the automatically selected features (n=19): mean sensitivity (P=.13; 95% CI −0.1699 to 0.0681), mean specificity (P=.10; 95% CI −0.1389 to 0.4017), mean area under the receiver operating characteristic curve (P=.008; 95% CI 0.0059-0.1126), and mean macro F1 (P=.98; 95% CI −0.0555 to 0.0548). This demonstrates the importance and effectiveness of combining automatic feature selection and expert-based linguistic revision to develop the most effective machine learning algorithms from high-dimensional data sets. CONCLUSIONS: We developed new evaluation tools for the discrimination and prediction of writing styles of web-based health resources for children’s health education and promotion among parents and caregivers of children. User-adaptive automatic assessment of web-based health content holds great promise for distant and remote health education among young readers. Our study leveraged the precision and adaptability of machine learning algorithms and insights from health linguistics to help advance this significant yet understudied area of research.
format	Online Article Text
id	pubmed-8367110
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-83671102021-08-24 Predicting Writing Styles of Web-Based Materials for Children’s Health Education Using the Selection of Semantic Features: Machine Learning Approach Xie, Wenxiu Ji, Meng Liu, Yanmeng Hao, Tianyong Chow, Chi-Yin JMIR Med Inform Original Paper BACKGROUND: Medical writing styles can have an impact on the understandability of health educational resources. Amid current web-based health information research, there is a dearth of research-based evidence that demonstrates what constitutes the best practice of the development of web-based health resources on children’s health promotion and education. OBJECTIVE: Using authoritative and highly influential web-based children’s health educational resources from the Nemours Foundation, the largest not-for-profit organization promoting children’s health and well-being, we aimed to develop machine learning algorithms to discriminate and predict the writing styles of health educational resources on children versus adult health promotion using a variety of health educational resources aimed at the general public. METHODS: The selection of natural language features as predicator variables of algorithms went through initial automatic feature selection using ridge classifier, support vector machine, extreme gradient boost tree, and recursive feature elimination followed by revision by education experts. We compared algorithms using the automatically selected (n=19) and linguistically enhanced (n=20) feature sets, using the initial feature set (n=115) as the baseline. RESULTS: Using five-fold cross-validation, compared with the baseline (115 features), the Gaussian Naive Bayes model (20 features) achieved statistically higher mean sensitivity (P=.02; 95% CI −0.016 to 0.1929), mean specificity (P=.02; 95% CI −0.016 to 0.199), mean area under the receiver operating characteristic curve (P=.02; 95% CI −0.007 to 0.140), and mean macro F1 (P=.006; 95% CI 0.016-0.167). The statistically improved performance of the final model (20 features) is in contrast to the statistically insignificant changes between the original feature set (n=115) and the automatically selected features (n=19): mean sensitivity (P=.13; 95% CI −0.1699 to 0.0681), mean specificity (P=.10; 95% CI −0.1389 to 0.4017), mean area under the receiver operating characteristic curve (P=.008; 95% CI 0.0059-0.1126), and mean macro F1 (P=.98; 95% CI −0.0555 to 0.0548). This demonstrates the importance and effectiveness of combining automatic feature selection and expert-based linguistic revision to develop the most effective machine learning algorithms from high-dimensional data sets. CONCLUSIONS: We developed new evaluation tools for the discrimination and prediction of writing styles of web-based health resources for children’s health education and promotion among parents and caregivers of children. User-adaptive automatic assessment of web-based health content holds great promise for distant and remote health education among young readers. Our study leveraged the precision and adaptability of machine learning algorithms and insights from health linguistics to help advance this significant yet understudied area of research. JMIR Publications 2021-07-22 /pmc/articles/PMC8367110/ /pubmed/34292167 http://dx.doi.org/10.2196/30115 Text en ©Wenxiu Xie, Meng Ji, Yanmeng Liu, Tianyong Hao, Chi-Yin Chow. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 22.07.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Xie, Wenxiu Ji, Meng Liu, Yanmeng Hao, Tianyong Chow, Chi-Yin Predicting Writing Styles of Web-Based Materials for Children’s Health Education Using the Selection of Semantic Features: Machine Learning Approach
title	Predicting Writing Styles of Web-Based Materials for Children’s Health Education Using the Selection of Semantic Features: Machine Learning Approach
title_full	Predicting Writing Styles of Web-Based Materials for Children’s Health Education Using the Selection of Semantic Features: Machine Learning Approach
title_fullStr	Predicting Writing Styles of Web-Based Materials for Children’s Health Education Using the Selection of Semantic Features: Machine Learning Approach
title_full_unstemmed	Predicting Writing Styles of Web-Based Materials for Children’s Health Education Using the Selection of Semantic Features: Machine Learning Approach
title_short	Predicting Writing Styles of Web-Based Materials for Children’s Health Education Using the Selection of Semantic Features: Machine Learning Approach
title_sort	predicting writing styles of web-based materials for children’s health education using the selection of semantic features: machine learning approach
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8367110/ https://www.ncbi.nlm.nih.gov/pubmed/34292167 http://dx.doi.org/10.2196/30115
work_keys_str_mv	AT xiewenxiu predictingwritingstylesofwebbasedmaterialsforchildrenshealtheducationusingtheselectionofsemanticfeaturesmachinelearningapproach AT jimeng predictingwritingstylesofwebbasedmaterialsforchildrenshealtheducationusingtheselectionofsemanticfeaturesmachinelearningapproach AT liuyanmeng predictingwritingstylesofwebbasedmaterialsforchildrenshealtheducationusingtheselectionofsemanticfeaturesmachinelearningapproach AT haotianyong predictingwritingstylesofwebbasedmaterialsforchildrenshealtheducationusingtheselectionofsemanticfeaturesmachinelearningapproach AT chowchiyin predictingwritingstylesofwebbasedmaterialsforchildrenshealtheducationusingtheselectionofsemanticfeaturesmachinelearningapproach

Predicting Writing Styles of Web-Based Materials for Children’s Health Education Using the Selection of Semantic Features: Machine Learning Approach

Ejemplares similares