Cargando…

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this wo...

Descripción completa

Detalles Bibliográficos
Autores principales: Rabby, Gollam, D’Souza, Jennifer, Oelen, Allard, Dvorackova, Lucie, Svátek, Vojtěch, Auer, Sören
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683290/
https://www.ncbi.nlm.nih.gov/pubmed/38017587
http://dx.doi.org/10.1186/s13326-023-00298-4
_version_ 1785151162620575744
author Rabby, Gollam
D’Souza, Jennifer
Oelen, Allard
Dvorackova, Lucie
Svátek, Vojtěch
Auer, Sören
author_facet Rabby, Gollam
D’Souza, Jennifer
Oelen, Allard
Dvorackova, Lucie
Svátek, Vojtěch
Auer, Sören
author_sort Rabby, Gollam
collection PubMed
description Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.
format Online
Article
Text
id pubmed-10683290
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106832902023-11-30 Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph Rabby, Gollam D’Souza, Jennifer Oelen, Allard Dvorackova, Lucie Svátek, Vojtěch Auer, Sören J Biomed Semantics Research Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data. BioMed Central 2023-11-28 /pmc/articles/PMC10683290/ /pubmed/38017587 http://dx.doi.org/10.1186/s13326-023-00298-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Rabby, Gollam
D’Souza, Jennifer
Oelen, Allard
Dvorackova, Lucie
Svátek, Vojtěch
Auer, Sören
Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_full Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_fullStr Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_full_unstemmed Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_short Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_sort impact of covid-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683290/
https://www.ncbi.nlm.nih.gov/pubmed/38017587
http://dx.doi.org/10.1186/s13326-023-00298-4
work_keys_str_mv AT rabbygollam impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph
AT dsouzajennifer impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph
AT oelenallard impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph
AT dvorackovalucie impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph
AT svatekvojtech impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph
AT auersoren impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph