Cargando…
Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this wo...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683290/ https://www.ncbi.nlm.nih.gov/pubmed/38017587 http://dx.doi.org/10.1186/s13326-023-00298-4 |
_version_ | 1785151162620575744 |
---|---|
author | Rabby, Gollam D’Souza, Jennifer Oelen, Allard Dvorackova, Lucie Svátek, Vojtěch Auer, Sören |
author_facet | Rabby, Gollam D’Souza, Jennifer Oelen, Allard Dvorackova, Lucie Svátek, Vojtěch Auer, Sören |
author_sort | Rabby, Gollam |
collection | PubMed |
description | Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data. |
format | Online Article Text |
id | pubmed-10683290 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106832902023-11-30 Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph Rabby, Gollam D’Souza, Jennifer Oelen, Allard Dvorackova, Lucie Svátek, Vojtěch Auer, Sören J Biomed Semantics Research Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data. BioMed Central 2023-11-28 /pmc/articles/PMC10683290/ /pubmed/38017587 http://dx.doi.org/10.1186/s13326-023-00298-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Rabby, Gollam D’Souza, Jennifer Oelen, Allard Dvorackova, Lucie Svátek, Vojtěch Auer, Sören Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph |
title | Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph |
title_full | Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph |
title_fullStr | Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph |
title_full_unstemmed | Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph |
title_short | Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph |
title_sort | impact of covid-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683290/ https://www.ncbi.nlm.nih.gov/pubmed/38017587 http://dx.doi.org/10.1186/s13326-023-00298-4 |
work_keys_str_mv | AT rabbygollam impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT dsouzajennifer impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT oelenallard impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT dvorackovalucie impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT svatekvojtech impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT auersoren impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph |