Cargando…

Machine learning in medicine: a practical introduction to natural language processing

BACKGROUND: Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be...

Descripción completa

Detalles Bibliográficos
Autores principales:	Harrison, Conrad J., Sidey-Gibbons, Chris J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8325804/ https://www.ncbi.nlm.nih.gov/pubmed/34332525 http://dx.doi.org/10.1186/s12874-021-01347-1

_version_	1783731622789513216
author	Harrison, Conrad J. Sidey-Gibbons, Chris J.
author_facet	Harrison, Conrad J. Sidey-Gibbons, Chris J.
author_sort	Harrison, Conrad J.
collection	PubMed
description	BACKGROUND: Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be analysed by statistical and machine learning (ML) models. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software. METHODS: We performed three NLP experiments using publicly-available data obtained from medicine review websites. First, we conducted lexicon-based sentiment analysis on open-text patient reviews of four drugs: Levothyroxine, Viagra, Oseltamivir and Apixaban. Next, we used unsupervised ML (latent Dirichlet allocation, LDA) to identify similar drugs in the dataset, based solely on their reviews. Finally, we developed three supervised ML algorithms to predict whether a drug review was associated with a positive or negative rating. These algorithms were: a regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). We compared the performance of these algorithms in terms of classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity and specificity. RESULTS: Levothyroxine and Viagra were reviewed with a higher proportion of positive sentiments than Oseltamivir and Apixaban. One of the three LDA clusters clearly represented drugs used to treat mental health problems. A common theme suggested by this cluster was drugs taking weeks or months to work. Another cluster clearly represented drugs used as contraceptives. Supervised machine learning algorithms predicted positive or negative drug ratings with classification accuracies ranging from 0.664, 95% CI [0.608, 0.716] for the regularised regression to 0.720, 95% CI [0.664,0.776] for the SVM. CONCLUSIONS: In this paper, we present a conceptual overview of common techniques used to analyse large volumes of text, and provide reproducible code that can be readily applied to other research studies using open-source software. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01347-1.
format	Online Article Text
id	pubmed-8325804
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-83258042021-08-02 Machine learning in medicine: a practical introduction to natural language processing Harrison, Conrad J. Sidey-Gibbons, Chris J. BMC Med Res Methodol Research BACKGROUND: Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be analysed by statistical and machine learning (ML) models. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software. METHODS: We performed three NLP experiments using publicly-available data obtained from medicine review websites. First, we conducted lexicon-based sentiment analysis on open-text patient reviews of four drugs: Levothyroxine, Viagra, Oseltamivir and Apixaban. Next, we used unsupervised ML (latent Dirichlet allocation, LDA) to identify similar drugs in the dataset, based solely on their reviews. Finally, we developed three supervised ML algorithms to predict whether a drug review was associated with a positive or negative rating. These algorithms were: a regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). We compared the performance of these algorithms in terms of classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity and specificity. RESULTS: Levothyroxine and Viagra were reviewed with a higher proportion of positive sentiments than Oseltamivir and Apixaban. One of the three LDA clusters clearly represented drugs used to treat mental health problems. A common theme suggested by this cluster was drugs taking weeks or months to work. Another cluster clearly represented drugs used as contraceptives. Supervised machine learning algorithms predicted positive or negative drug ratings with classification accuracies ranging from 0.664, 95% CI [0.608, 0.716] for the regularised regression to 0.720, 95% CI [0.664,0.776] for the SVM. CONCLUSIONS: In this paper, we present a conceptual overview of common techniques used to analyse large volumes of text, and provide reproducible code that can be readily applied to other research studies using open-source software. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01347-1. BioMed Central 2021-07-31 /pmc/articles/PMC8325804/ /pubmed/34332525 http://dx.doi.org/10.1186/s12874-021-01347-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Harrison, Conrad J. Sidey-Gibbons, Chris J. Machine learning in medicine: a practical introduction to natural language processing
title	Machine learning in medicine: a practical introduction to natural language processing
title_full	Machine learning in medicine: a practical introduction to natural language processing
title_fullStr	Machine learning in medicine: a practical introduction to natural language processing
title_full_unstemmed	Machine learning in medicine: a practical introduction to natural language processing
title_short	Machine learning in medicine: a practical introduction to natural language processing
title_sort	machine learning in medicine: a practical introduction to natural language processing
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8325804/ https://www.ncbi.nlm.nih.gov/pubmed/34332525 http://dx.doi.org/10.1186/s12874-021-01347-1
work_keys_str_mv	AT harrisonconradj machinelearninginmedicineapracticalintroductiontonaturallanguageprocessing AT sideygibbonschrisj machinelearninginmedicineapracticalintroductiontonaturallanguageprocessing

Machine learning in medicine: a practical introduction to natural language processing

Ejemplares similares