Cargando…
Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection
Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10695477/ http://dx.doi.org/10.1590/0102-311XEN243722 |
_version_ | 1785153573156290560 |
---|---|
author | da Silva, Rildo Pinto Pollettini, Juliana Tarossi Pazin, Antonio |
author_facet | da Silva, Rildo Pinto Pollettini, Juliana Tarossi Pazin, Antonio |
author_sort | da Silva, Rildo Pinto |
collection | PubMed |
description | Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning. |
format | Online Article Text |
id | pubmed-10695477 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz |
record_format | MEDLINE/PubMed |
spelling | pubmed-106954772023-12-05 Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection da Silva, Rildo Pinto Pollettini, Juliana Tarossi Pazin, Antonio Cad Saude Publica Article Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning. Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz 2023-12-04 /pmc/articles/PMC10695477/ http://dx.doi.org/10.1590/0102-311XEN243722 Text en https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License |
spellingShingle | Article da Silva, Rildo Pinto Pollettini, Juliana Tarossi Pazin, Antonio Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection |
title | Unsupervised natural language processing in the identification of
patients with suspected COVID-19 infection |
title_full | Unsupervised natural language processing in the identification of
patients with suspected COVID-19 infection |
title_fullStr | Unsupervised natural language processing in the identification of
patients with suspected COVID-19 infection |
title_full_unstemmed | Unsupervised natural language processing in the identification of
patients with suspected COVID-19 infection |
title_short | Unsupervised natural language processing in the identification of
patients with suspected COVID-19 infection |
title_sort | unsupervised natural language processing in the identification of
patients with suspected covid-19 infection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10695477/ http://dx.doi.org/10.1590/0102-311XEN243722 |
work_keys_str_mv | AT dasilvarildopinto unsupervisednaturallanguageprocessingintheidentificationofpatientswithsuspectedcovid19infection AT pollettinijulianatarossi unsupervisednaturallanguageprocessingintheidentificationofpatientswithsuspectedcovid19infection AT pazinantonio unsupervisednaturallanguageprocessingintheidentificationofpatientswithsuspectedcovid19infection |