Cargando…

Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning

BACKGROUND: The spread of false medical information on the web is rapidly accelerating. Establishing the credibility of web-based medical information has become a pressing necessity. Machine learning offers a solution that, when properly deployed, can be an effective tool in fighting medical misinfo...

Descripción completa

Detalles Bibliográficos
Autores principales: Nabożny, Aleksandra, Balcerzak, Bartłomiej, Wierzbicki, Adam, Morzy, Mikołaj, Chlabicz, Małgorzata
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665397/
https://www.ncbi.nlm.nih.gov/pubmed/34842547
http://dx.doi.org/10.2196/26065
_version_ 1784614000678404096
author Nabożny, Aleksandra
Balcerzak, Bartłomiej
Wierzbicki, Adam
Morzy, Mikołaj
Chlabicz, Małgorzata
author_facet Nabożny, Aleksandra
Balcerzak, Bartłomiej
Wierzbicki, Adam
Morzy, Mikołaj
Chlabicz, Małgorzata
author_sort Nabożny, Aleksandra
collection PubMed
description BACKGROUND: The spread of false medical information on the web is rapidly accelerating. Establishing the credibility of web-based medical information has become a pressing necessity. Machine learning offers a solution that, when properly deployed, can be an effective tool in fighting medical misinformation on the web. OBJECTIVE: The aim of this study is to present a comprehensive framework for designing and curating machine learning training data sets for web-based medical information credibility assessment. We show how to construct the annotation process. Our main objective is to support researchers from the medical and computer science communities. We offer guidelines on the preparation of data sets for machine learning models that can fight medical misinformation. METHODS: We begin by providing the annotation protocol for medical experts involved in medical sentence credibility evaluation. The protocol is based on a qualitative study of our experimental data. To address the problem of insufficient initial labels, we propose a preprocessing pipeline for the batch of sentences to be assessed. It consists of representation learning, clustering, and reranking. We call this process active annotation. RESULTS: We collected more than 10,000 annotations of statements related to selected medical subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, and food allergy testing) for less than US $7000 by employing 9 highly qualified annotators (certified medical professionals), and we release this data set to the general public. We developed an active annotation framework for more efficient annotation of noncredible medical statements. The application of qualitative analysis resulted in a better annotation protocol for our future efforts in data set creation. CONCLUSIONS: The results of the qualitative analysis support our claims of the efficacy of the presented method.
format Online
Article
Text
id pubmed-8665397
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-86653972021-12-30 Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning Nabożny, Aleksandra Balcerzak, Bartłomiej Wierzbicki, Adam Morzy, Mikołaj Chlabicz, Małgorzata JMIR Med Inform Original Paper BACKGROUND: The spread of false medical information on the web is rapidly accelerating. Establishing the credibility of web-based medical information has become a pressing necessity. Machine learning offers a solution that, when properly deployed, can be an effective tool in fighting medical misinformation on the web. OBJECTIVE: The aim of this study is to present a comprehensive framework for designing and curating machine learning training data sets for web-based medical information credibility assessment. We show how to construct the annotation process. Our main objective is to support researchers from the medical and computer science communities. We offer guidelines on the preparation of data sets for machine learning models that can fight medical misinformation. METHODS: We begin by providing the annotation protocol for medical experts involved in medical sentence credibility evaluation. The protocol is based on a qualitative study of our experimental data. To address the problem of insufficient initial labels, we propose a preprocessing pipeline for the batch of sentences to be assessed. It consists of representation learning, clustering, and reranking. We call this process active annotation. RESULTS: We collected more than 10,000 annotations of statements related to selected medical subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, and food allergy testing) for less than US $7000 by employing 9 highly qualified annotators (certified medical professionals), and we release this data set to the general public. We developed an active annotation framework for more efficient annotation of noncredible medical statements. The application of qualitative analysis resulted in a better annotation protocol for our future efforts in data set creation. CONCLUSIONS: The results of the qualitative analysis support our claims of the efficacy of the presented method. JMIR Publications 2021-11-26 /pmc/articles/PMC8665397/ /pubmed/34842547 http://dx.doi.org/10.2196/26065 Text en ©Aleksandra Nabożny, Bartłomiej Balcerzak, Adam Wierzbicki, Mikołaj Morzy, Małgorzata Chlabicz. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 26.11.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Nabożny, Aleksandra
Balcerzak, Bartłomiej
Wierzbicki, Adam
Morzy, Mikołaj
Chlabicz, Małgorzata
Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning
title Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning
title_full Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning
title_fullStr Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning
title_full_unstemmed Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning
title_short Active Annotation in Evaluating the Credibility of Web-Based Medical Information: Guidelines for Creating Training Data Sets for Machine Learning
title_sort active annotation in evaluating the credibility of web-based medical information: guidelines for creating training data sets for machine learning
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8665397/
https://www.ncbi.nlm.nih.gov/pubmed/34842547
http://dx.doi.org/10.2196/26065
work_keys_str_mv AT naboznyaleksandra activeannotationinevaluatingthecredibilityofwebbasedmedicalinformationguidelinesforcreatingtrainingdatasetsformachinelearning
AT balcerzakbartłomiej activeannotationinevaluatingthecredibilityofwebbasedmedicalinformationguidelinesforcreatingtrainingdatasetsformachinelearning
AT wierzbickiadam activeannotationinevaluatingthecredibilityofwebbasedmedicalinformationguidelinesforcreatingtrainingdatasetsformachinelearning
AT morzymikołaj activeannotationinevaluatingthecredibilityofwebbasedmedicalinformationguidelinesforcreatingtrainingdatasetsformachinelearning
AT chlabiczmałgorzata activeannotationinevaluatingthecredibilityofwebbasedmedicalinformationguidelinesforcreatingtrainingdatasetsformachinelearning