Cargando…

A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis

BACKGROUND: Shortly after the emergence of COVID-19, researchers rapidly mobilized to study numerous aspects of the disease such as its evolution, clinical manifestations, effects, treatments, and vaccinations. This led to a rapid increase in the number of COVID-19–related publications. Identifying...

Descripción completa

Detalles Bibliográficos
Autores principales: Abd-Alrazaq, Alaa, Schneider, Jens, Mifsud, Borbala, Alam, Tanvir, Househ, Mowafa, Hamdi, Mounir, Shah, Zubair
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7942394/
https://www.ncbi.nlm.nih.gov/pubmed/33600346
http://dx.doi.org/10.2196/23703
_version_ 1783662308468195328
author Abd-Alrazaq, Alaa
Schneider, Jens
Mifsud, Borbala
Alam, Tanvir
Househ, Mowafa
Hamdi, Mounir
Shah, Zubair
author_facet Abd-Alrazaq, Alaa
Schneider, Jens
Mifsud, Borbala
Alam, Tanvir
Househ, Mowafa
Hamdi, Mounir
Shah, Zubair
author_sort Abd-Alrazaq, Alaa
collection PubMed
description BACKGROUND: Shortly after the emergence of COVID-19, researchers rapidly mobilized to study numerous aspects of the disease such as its evolution, clinical manifestations, effects, treatments, and vaccinations. This led to a rapid increase in the number of COVID-19–related publications. Identifying trends and areas of interest using traditional review methods (eg, scoping and systematic reviews) for such a large domain area is challenging. OBJECTIVE: We aimed to conduct an extensive bibliometric analysis to provide a comprehensive overview of the COVID-19 literature. METHODS: We used the COVID-19 Open Research Dataset (CORD-19) that consists of a large number of research articles related to all coronaviruses. We used a machine learning–based method to analyze the most relevant COVID-19–related articles and extracted the most prominent topics. Specifically, we used a clustering algorithm to group published articles based on the similarity of their abstracts to identify research hotspots and current research directions. We have made our software accessible to the community via GitHub. RESULTS: Of the 196,630 publications retrieved from the database, we included 28,904 in our analysis. The mean number of weekly publications was 990 (SD 789.3). The country that published the highest number of COVID-19–related articles was China (2950/17,270, 17.08%). The highest number of articles were published in bioRxiv. Lei Liu affiliated with the Southern University of Science and Technology in China published the highest number of articles (n=46). Based on titles and abstracts alone, we were able to identify 1515 surveys, 733 systematic reviews, 512 cohort studies, 480 meta-analyses, and 362 randomized control trials. We identified 19 different topics covered among the publications reviewed. The most dominant topic was public health response, followed by clinical care practices during the COVID-19 pandemic, clinical characteristics and risk factors, and epidemic models for its spread. CONCLUSIONS: We provide an overview of the COVID-19 literature and have identified current hotspots and research directions. Our findings can be useful for the research community to help prioritize research needs and recognize leading COVID-19 researchers, institutes, countries, and publishers. Our study shows that an AI-based bibliometric analysis has the potential to rapidly explore a large corpus of academic publications during a public health crisis. We believe that this work can be used to analyze other eHealth-related literature to help clinicians, administrators, and policy makers to obtain a holistic view of the literature and be able to categorize different topics of the existing research for further analyses. It can be further scaled (for instance, in time) to clinical summary documentation. Publishers should avoid noise in the data by developing a way to trace the evolution of individual publications and unique authors.
format Online
Article
Text
id pubmed-7942394
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-79423942021-03-12 A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis Abd-Alrazaq, Alaa Schneider, Jens Mifsud, Borbala Alam, Tanvir Househ, Mowafa Hamdi, Mounir Shah, Zubair J Med Internet Res Original Paper BACKGROUND: Shortly after the emergence of COVID-19, researchers rapidly mobilized to study numerous aspects of the disease such as its evolution, clinical manifestations, effects, treatments, and vaccinations. This led to a rapid increase in the number of COVID-19–related publications. Identifying trends and areas of interest using traditional review methods (eg, scoping and systematic reviews) for such a large domain area is challenging. OBJECTIVE: We aimed to conduct an extensive bibliometric analysis to provide a comprehensive overview of the COVID-19 literature. METHODS: We used the COVID-19 Open Research Dataset (CORD-19) that consists of a large number of research articles related to all coronaviruses. We used a machine learning–based method to analyze the most relevant COVID-19–related articles and extracted the most prominent topics. Specifically, we used a clustering algorithm to group published articles based on the similarity of their abstracts to identify research hotspots and current research directions. We have made our software accessible to the community via GitHub. RESULTS: Of the 196,630 publications retrieved from the database, we included 28,904 in our analysis. The mean number of weekly publications was 990 (SD 789.3). The country that published the highest number of COVID-19–related articles was China (2950/17,270, 17.08%). The highest number of articles were published in bioRxiv. Lei Liu affiliated with the Southern University of Science and Technology in China published the highest number of articles (n=46). Based on titles and abstracts alone, we were able to identify 1515 surveys, 733 systematic reviews, 512 cohort studies, 480 meta-analyses, and 362 randomized control trials. We identified 19 different topics covered among the publications reviewed. The most dominant topic was public health response, followed by clinical care practices during the COVID-19 pandemic, clinical characteristics and risk factors, and epidemic models for its spread. CONCLUSIONS: We provide an overview of the COVID-19 literature and have identified current hotspots and research directions. Our findings can be useful for the research community to help prioritize research needs and recognize leading COVID-19 researchers, institutes, countries, and publishers. Our study shows that an AI-based bibliometric analysis has the potential to rapidly explore a large corpus of academic publications during a public health crisis. We believe that this work can be used to analyze other eHealth-related literature to help clinicians, administrators, and policy makers to obtain a holistic view of the literature and be able to categorize different topics of the existing research for further analyses. It can be further scaled (for instance, in time) to clinical summary documentation. Publishers should avoid noise in the data by developing a way to trace the evolution of individual publications and unique authors. JMIR Publications 2021-03-08 /pmc/articles/PMC7942394/ /pubmed/33600346 http://dx.doi.org/10.2196/23703 Text en ©Alaa Abd-Alrazaq, Jens Schneider, Borbala Mifsud, Tanvir Alam, Mowafa Househ, Mounir Hamdi, Zubair Shah. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 08.03.2021. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Abd-Alrazaq, Alaa
Schneider, Jens
Mifsud, Borbala
Alam, Tanvir
Househ, Mowafa
Hamdi, Mounir
Shah, Zubair
A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis
title A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis
title_full A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis
title_fullStr A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis
title_full_unstemmed A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis
title_short A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis
title_sort comprehensive overview of the covid-19 literature: machine learning–based bibliometric analysis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7942394/
https://www.ncbi.nlm.nih.gov/pubmed/33600346
http://dx.doi.org/10.2196/23703
work_keys_str_mv AT abdalrazaqalaa acomprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT schneiderjens acomprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT mifsudborbala acomprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT alamtanvir acomprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT househmowafa acomprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT hamdimounir acomprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT shahzubair acomprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT abdalrazaqalaa comprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT schneiderjens comprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT mifsudborbala comprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT alamtanvir comprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT househmowafa comprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT hamdimounir comprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis
AT shahzubair comprehensiveoverviewofthecovid19literaturemachinelearningbasedbibliometricanalysis