Cargando…

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this wo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rabby, Gollam, D’Souza, Jennifer, Oelen, Allard, Dvorackova, Lucie, Svátek, Vojtěch, Auer, Sören
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683290/ https://www.ncbi.nlm.nih.gov/pubmed/38017587 http://dx.doi.org/10.1186/s13326-023-00298-4

_version_	1785151162620575744
author	Rabby, Gollam D’Souza, Jennifer Oelen, Allard Dvorackova, Lucie Svátek, Vojtěch Auer, Sören
author_facet	Rabby, Gollam D’Souza, Jennifer Oelen, Allard Dvorackova, Lucie Svátek, Vojtěch Auer, Sören
author_sort	Rabby, Gollam
collection	PubMed
description	Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.
format	Online Article Text
id	pubmed-10683290
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-106832902023-11-30 Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph Rabby, Gollam D’Souza, Jennifer Oelen, Allard Dvorackova, Lucie Svátek, Vojtěch Auer, Sören J Biomed Semantics Research Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data. BioMed Central 2023-11-28 /pmc/articles/PMC10683290/ /pubmed/38017587 http://dx.doi.org/10.1186/s13326-023-00298-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Rabby, Gollam D’Souza, Jennifer Oelen, Allard Dvorackova, Lucie Svátek, Vojtěch Auer, Sören Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title	Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_full	Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_fullStr	Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_full_unstemmed	Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_short	Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
title_sort	impact of covid-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683290/ https://www.ncbi.nlm.nih.gov/pubmed/38017587 http://dx.doi.org/10.1186/s13326-023-00298-4
work_keys_str_mv	AT rabbygollam impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT dsouzajennifer impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT oelenallard impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT dvorackovalucie impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT svatekvojtech impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph AT auersoren impactofcovid19researchastudyonpredictinginfluentialscholarlydocumentsusingmachinelearningandadomainindependentknowledgegraph

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

Ejemplares similares