Cargando…

Automatic Identification of Information Quality Metrics in Health News Stories

Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-relat...

Descripción completa

Detalles Bibliográficos
Autores principales: Al-Jefri, Majed, Evans, Roger, Lee, Joon, Ghezzi, Pietro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7775604/
https://www.ncbi.nlm.nih.gov/pubmed/33392124
http://dx.doi.org/10.3389/fpubh.2020.515347
_version_ 1783630505517776896
author Al-Jefri, Majed
Evans, Roger
Lee, Joon
Ghezzi, Pietro
author_facet Al-Jefri, Majed
Evans, Roger
Lee, Joon
Ghezzi, Pietro
author_sort Al-Jefri, Majed
collection PubMed
description Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning. Materials and Methods: We used a database from the website HealthNewsReview.org that aims to improve the public dialogue about health care. HealthNewsReview.org developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT. Results: For some criteria, such as mention of costs, benefits, harms, and “disease-mongering,” the evaluation results were promising with an F(1) measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process. Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment.
format Online
Article
Text
id pubmed-7775604
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-77756042021-01-02 Automatic Identification of Information Quality Metrics in Health News Stories Al-Jefri, Majed Evans, Roger Lee, Joon Ghezzi, Pietro Front Public Health Public Health Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning. Materials and Methods: We used a database from the website HealthNewsReview.org that aims to improve the public dialogue about health care. HealthNewsReview.org developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT. Results: For some criteria, such as mention of costs, benefits, harms, and “disease-mongering,” the evaluation results were promising with an F(1) measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process. Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment. Frontiers Media S.A. 2020-12-18 /pmc/articles/PMC7775604/ /pubmed/33392124 http://dx.doi.org/10.3389/fpubh.2020.515347 Text en Copyright © 2020 Al-Jefri, Evans, Lee and Ghezzi. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Public Health
Al-Jefri, Majed
Evans, Roger
Lee, Joon
Ghezzi, Pietro
Automatic Identification of Information Quality Metrics in Health News Stories
title Automatic Identification of Information Quality Metrics in Health News Stories
title_full Automatic Identification of Information Quality Metrics in Health News Stories
title_fullStr Automatic Identification of Information Quality Metrics in Health News Stories
title_full_unstemmed Automatic Identification of Information Quality Metrics in Health News Stories
title_short Automatic Identification of Information Quality Metrics in Health News Stories
title_sort automatic identification of information quality metrics in health news stories
topic Public Health
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7775604/
https://www.ncbi.nlm.nih.gov/pubmed/33392124
http://dx.doi.org/10.3389/fpubh.2020.515347
work_keys_str_mv AT aljefrimajed automaticidentificationofinformationqualitymetricsinhealthnewsstories
AT evansroger automaticidentificationofinformationqualitymetricsinhealthnewsstories
AT leejoon automaticidentificationofinformationqualitymetricsinhealthnewsstories
AT ghezzipietro automaticidentificationofinformationqualitymetricsinhealthnewsstories