Cargando…

Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study

BACKGROUND: Evidence from peer-reviewed literature is the cornerstone for designing responses to global threats such as COVID-19. In massive and rapidly growing corpuses, such as COVID-19 publications, assimilating and synthesizing information is challenging. Leveraging a robust computational pipeli...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pal, Ridam, Chopra, Harshita, Awasthi, Raghav, Bandhey, Harsh, Nagori, Aditya, Sethi, Tavpritesh
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9629347/ https://www.ncbi.nlm.nih.gov/pubmed/36040993 http://dx.doi.org/10.2196/34067

_version_	1784823379575963648
author	Pal, Ridam Chopra, Harshita Awasthi, Raghav Bandhey, Harsh Nagori, Aditya Sethi, Tavpritesh
author_facet	Pal, Ridam Chopra, Harshita Awasthi, Raghav Bandhey, Harsh Nagori, Aditya Sethi, Tavpritesh
author_sort	Pal, Ridam
collection	PubMed
description	BACKGROUND: Evidence from peer-reviewed literature is the cornerstone for designing responses to global threats such as COVID-19. In massive and rapidly growing corpuses, such as COVID-19 publications, assimilating and synthesizing information is challenging. Leveraging a robust computational pipeline that evaluates multiple aspects, such as network topological features, communities, and their temporal trends, can make this process more efficient. OBJECTIVE: We aimed to show that new knowledge can be captured and tracked using the temporal change in the underlying unsupervised word embeddings of the literature. Further imminent themes can be predicted using machine learning on the evolving associations between words. METHODS: Frequently occurring medical entities were extracted from the abstracts of more than 150,000 COVID-19 articles published on the World Health Organization database, collected on a monthly interval starting from February 2020. Word embeddings trained on each month’s literature were used to construct networks of entities with cosine similarities as edge weights. Topological features of the subsequent month’s network were forecasted based on prior patterns, and new links were predicted using supervised machine learning. Community detection and alluvial diagrams were used to track biomedical themes that evolved over the months. RESULTS: We found that thromboembolic complications were detected as an emerging theme as early as August 2020. A shift toward the symptoms of long COVID complications was observed during March 2021, and neurological complications gained significance in June 2021. A prospective validation of the link prediction models achieved an area under the receiver operating characteristic curve of 0.87. Predictive modeling revealed predisposing conditions, symptoms, cross-infection, and neurological complications as dominant research themes in COVID-19 publications based on the patterns observed in previous months. CONCLUSIONS: Machine learning–based prediction of emerging links can contribute toward steering research by capturing themes represented by groups of medical entities, based on patterns of semantic relationships over time.
format	Online Article Text
id	pubmed-9629347
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-96293472022-11-03 Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study Pal, Ridam Chopra, Harshita Awasthi, Raghav Bandhey, Harsh Nagori, Aditya Sethi, Tavpritesh J Med Internet Res Original Paper BACKGROUND: Evidence from peer-reviewed literature is the cornerstone for designing responses to global threats such as COVID-19. In massive and rapidly growing corpuses, such as COVID-19 publications, assimilating and synthesizing information is challenging. Leveraging a robust computational pipeline that evaluates multiple aspects, such as network topological features, communities, and their temporal trends, can make this process more efficient. OBJECTIVE: We aimed to show that new knowledge can be captured and tracked using the temporal change in the underlying unsupervised word embeddings of the literature. Further imminent themes can be predicted using machine learning on the evolving associations between words. METHODS: Frequently occurring medical entities were extracted from the abstracts of more than 150,000 COVID-19 articles published on the World Health Organization database, collected on a monthly interval starting from February 2020. Word embeddings trained on each month’s literature were used to construct networks of entities with cosine similarities as edge weights. Topological features of the subsequent month’s network were forecasted based on prior patterns, and new links were predicted using supervised machine learning. Community detection and alluvial diagrams were used to track biomedical themes that evolved over the months. RESULTS: We found that thromboembolic complications were detected as an emerging theme as early as August 2020. A shift toward the symptoms of long COVID complications was observed during March 2021, and neurological complications gained significance in June 2021. A prospective validation of the link prediction models achieved an area under the receiver operating characteristic curve of 0.87. Predictive modeling revealed predisposing conditions, symptoms, cross-infection, and neurological complications as dominant research themes in COVID-19 publications based on the patterns observed in previous months. CONCLUSIONS: Machine learning–based prediction of emerging links can contribute toward steering research by capturing themes represented by groups of medical entities, based on patterns of semantic relationships over time. JMIR Publications 2022-11-02 /pmc/articles/PMC9629347/ /pubmed/36040993 http://dx.doi.org/10.2196/34067 Text en ©Ridam Pal, Harshita Chopra, Raghav Awasthi, Harsh Bandhey, Aditya Nagori, Tavpritesh Sethi. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 02.11.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Pal, Ridam Chopra, Harshita Awasthi, Raghav Bandhey, Harsh Nagori, Aditya Sethi, Tavpritesh Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study
title	Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study
title_full	Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study
title_fullStr	Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study
title_full_unstemmed	Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study
title_short	Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study
title_sort	predicting emerging themes in rapidly expanding covid-19 literature with unsupervised word embeddings and machine learning: evidence-based study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9629347/ https://www.ncbi.nlm.nih.gov/pubmed/36040993 http://dx.doi.org/10.2196/34067
work_keys_str_mv	AT palridam predictingemergingthemesinrapidlyexpandingcovid19literaturewithunsupervisedwordembeddingsandmachinelearningevidencebasedstudy AT chopraharshita predictingemergingthemesinrapidlyexpandingcovid19literaturewithunsupervisedwordembeddingsandmachinelearningevidencebasedstudy AT awasthiraghav predictingemergingthemesinrapidlyexpandingcovid19literaturewithunsupervisedwordembeddingsandmachinelearningevidencebasedstudy AT bandheyharsh predictingemergingthemesinrapidlyexpandingcovid19literaturewithunsupervisedwordembeddingsandmachinelearningevidencebasedstudy AT nagoriaditya predictingemergingthemesinrapidlyexpandingcovid19literaturewithunsupervisedwordembeddingsandmachinelearningevidencebasedstudy AT sethitavpritesh predictingemergingthemesinrapidlyexpandingcovid19literaturewithunsupervisedwordembeddingsandmachinelearningevidencebasedstudy

Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study

Ejemplares similares