Cargando…

Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection

Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105...

Descripción completa

Detalles Bibliográficos
Autores principales:	da Silva, Rildo Pinto, Pollettini, Juliana Tarossi, Pazin, Antonio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10695477/ http://dx.doi.org/10.1590/0102-311XEN243722

_version_	1785153573156290560
author	da Silva, Rildo Pinto Pollettini, Juliana Tarossi Pazin, Antonio
author_facet	da Silva, Rildo Pinto Pollettini, Juliana Tarossi Pazin, Antonio
author_sort	da Silva, Rildo Pinto
collection	PubMed
description	Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning.
format	Online Article Text
id	pubmed-10695477
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz
record_format	MEDLINE/PubMed
spelling	pubmed-106954772023-12-05 Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection da Silva, Rildo Pinto Pollettini, Juliana Tarossi Pazin, Antonio Cad Saude Publica Article Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning. Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz 2023-12-04 /pmc/articles/PMC10695477/ http://dx.doi.org/10.1590/0102-311XEN243722 Text en https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License
spellingShingle	Article da Silva, Rildo Pinto Pollettini, Juliana Tarossi Pazin, Antonio Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title	Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_full	Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_fullStr	Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_full_unstemmed	Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_short	Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
title_sort	unsupervised natural language processing in the identification of patients with suspected covid-19 infection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10695477/ http://dx.doi.org/10.1590/0102-311XEN243722
work_keys_str_mv	AT dasilvarildopinto unsupervisednaturallanguageprocessingintheidentificationofpatientswithsuspectedcovid19infection AT pollettinijulianatarossi unsupervisednaturallanguageprocessingintheidentificationofpatientswithsuspectedcovid19infection AT pazinantonio unsupervisednaturallanguageprocessingintheidentificationofpatientswithsuspectedcovid19infection

Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection

Ejemplares similares