Cargando…
Automatic Identification of Information Quality Metrics in Health News Stories
Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-relat...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7775604/ https://www.ncbi.nlm.nih.gov/pubmed/33392124 http://dx.doi.org/10.3389/fpubh.2020.515347 |
_version_ | 1783630505517776896 |
---|---|
author | Al-Jefri, Majed Evans, Roger Lee, Joon Ghezzi, Pietro |
author_facet | Al-Jefri, Majed Evans, Roger Lee, Joon Ghezzi, Pietro |
author_sort | Al-Jefri, Majed |
collection | PubMed |
description | Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning. Materials and Methods: We used a database from the website HealthNewsReview.org that aims to improve the public dialogue about health care. HealthNewsReview.org developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT. Results: For some criteria, such as mention of costs, benefits, harms, and “disease-mongering,” the evaluation results were promising with an F(1) measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process. Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment. |
format | Online Article Text |
id | pubmed-7775604 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-77756042021-01-02 Automatic Identification of Information Quality Metrics in Health News Stories Al-Jefri, Majed Evans, Roger Lee, Joon Ghezzi, Pietro Front Public Health Public Health Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning. Materials and Methods: We used a database from the website HealthNewsReview.org that aims to improve the public dialogue about health care. HealthNewsReview.org developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT. Results: For some criteria, such as mention of costs, benefits, harms, and “disease-mongering,” the evaluation results were promising with an F(1) measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process. Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment. Frontiers Media S.A. 2020-12-18 /pmc/articles/PMC7775604/ /pubmed/33392124 http://dx.doi.org/10.3389/fpubh.2020.515347 Text en Copyright © 2020 Al-Jefri, Evans, Lee and Ghezzi. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Public Health Al-Jefri, Majed Evans, Roger Lee, Joon Ghezzi, Pietro Automatic Identification of Information Quality Metrics in Health News Stories |
title | Automatic Identification of Information Quality Metrics in Health News Stories |
title_full | Automatic Identification of Information Quality Metrics in Health News Stories |
title_fullStr | Automatic Identification of Information Quality Metrics in Health News Stories |
title_full_unstemmed | Automatic Identification of Information Quality Metrics in Health News Stories |
title_short | Automatic Identification of Information Quality Metrics in Health News Stories |
title_sort | automatic identification of information quality metrics in health news stories |
topic | Public Health |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7775604/ https://www.ncbi.nlm.nih.gov/pubmed/33392124 http://dx.doi.org/10.3389/fpubh.2020.515347 |
work_keys_str_mv | AT aljefrimajed automaticidentificationofinformationqualitymetricsinhealthnewsstories AT evansroger automaticidentificationofinformationqualitymetricsinhealthnewsstories AT leejoon automaticidentificationofinformationqualitymetricsinhealthnewsstories AT ghezzipietro automaticidentificationofinformationqualitymetricsinhealthnewsstories |