Cargando…

Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection

Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105...

Descripción completa

Detalles Bibliográficos
Autores principales: da Silva, Rildo Pinto, Pollettini, Juliana Tarossi, Pazin, Antonio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10695477/
http://dx.doi.org/10.1590/0102-311XEN243722
_version_ 1785153573156290560
author da Silva, Rildo Pinto
Pollettini, Juliana Tarossi
Pazin, Antonio
author_facet da Silva, Rildo Pinto
Pollettini, Juliana Tarossi
Pazin, Antonio
author_sort da Silva, Rildo Pinto
collection PubMed
description Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning.
format Online
Article
Text
id pubmed-10695477
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz
record_format MEDLINE/PubMed
spelling pubmed-106954772023-12-05 Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection da Silva, Rildo Pinto Pollettini, Juliana Tarossi Pazin, Antonio Cad Saude Publica Article Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning. Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz 2023-12-04 /pmc/articles/PMC10695477/ http://dx.doi.org/10.1590/0102-311XEN243722 Text en https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License
spellingShingle Article
da Silva, Rildo Pinto
Pollettini, Juliana Tarossi
Pazin, Antonio
Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_full Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_fullStr Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_full_unstemmed Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_short Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_sort unsupervised natural language processing in the identification of patients with suspected covid-19 infection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10695477/
http://dx.doi.org/10.1590/0102-311XEN243722
work_keys_str_mv AT dasilvarildopinto unsupervisednaturallanguageprocessingintheidentificationofpatientswithsuspectedcovid19infection
AT pollettinijulianatarossi unsupervisednaturallanguageprocessingintheidentificationofpatientswithsuspectedcovid19infection
AT pazinantonio unsupervisednaturallanguageprocessingintheidentificationofpatientswithsuspectedcovid19infection