Cargando…

Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis

BACKGROUND: Strategies to improve the selection of appropriate target journals may reduce delays in disseminating research results. Machine learning is increasingly used in content-based recommender algorithms to guide journal submissions for academic articles. OBJECTIVE: We sought to evaluate the p...

Descripción completa

Detalles Bibliográficos
Autores principales: Macri, Carmelo, Bacchi, Stephen, Teoh, Sheng Chieh, Lim, Wan Yin, Lam, Lydia, Patel, Sandy, Slee, Mark, Casson, Robert, Chan, WengOnn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10031443/
https://www.ncbi.nlm.nih.gov/pubmed/36881455
http://dx.doi.org/10.2196/42789
_version_ 1784910607482355712
author Macri, Carmelo
Bacchi, Stephen
Teoh, Sheng Chieh
Lim, Wan Yin
Lam, Lydia
Patel, Sandy
Slee, Mark
Casson, Robert
Chan, WengOnn
author_facet Macri, Carmelo
Bacchi, Stephen
Teoh, Sheng Chieh
Lim, Wan Yin
Lam, Lydia
Patel, Sandy
Slee, Mark
Casson, Robert
Chan, WengOnn
author_sort Macri, Carmelo
collection PubMed
description BACKGROUND: Strategies to improve the selection of appropriate target journals may reduce delays in disseminating research results. Machine learning is increasingly used in content-based recommender algorithms to guide journal submissions for academic articles. OBJECTIVE: We sought to evaluate the performance of open-source artificial intelligence to predict the impact factor or Eigenfactor score tertile using academic article abstracts. METHODS: PubMed-indexed articles published between 2016 and 2021 were identified with the Medical Subject Headings (MeSH) terms “ophthalmology,” “radiology,” and “neurology.” Journals, titles, abstracts, author lists, and MeSH terms were collected. Journal impact factor and Eigenfactor scores were sourced from the 2020 Clarivate Journal Citation Report. The journals included in the study were allocated percentile ranks based on impact factor and Eigenfactor scores, compared with other journals that released publications in the same year. All abstracts were preprocessed, which included the removal of the abstract structure, and combined with titles, authors, and MeSH terms as a single input. The input data underwent preprocessing with the inbuilt ktrain Bidirectional Encoder Representations from Transformers (BERT) preprocessing library before analysis with BERT. Before use for logistic regression and XGBoost models, the input data underwent punctuation removal, negation detection, stemming, and conversion into a term frequency-inverse document frequency array. Following this preprocessing, data were randomly split into training and testing data sets with a 3:1 train:test ratio. Models were developed to predict whether a given article would be published in a first, second, or third tertile journal (0-33rd centile, 34th-66th centile, or 67th-100th centile), as ranked either by impact factor or Eigenfactor score. BERT, XGBoost, and logistic regression models were developed on the training data set before evaluation on the hold-out test data set. The primary outcome was overall classification accuracy for the best-performing model in the prediction of accepting journal impact factor tertile. RESULTS: There were 10,813 articles from 382 unique journals. The median impact factor and Eigenfactor score were 2.117 (IQR 1.102-2.622) and 0.00247 (IQR 0.00105-0.03), respectively. The BERT model achieved the highest impact factor tertile classification accuracy of 75.0%, followed by an accuracy of 71.6% for XGBoost and 65.4% for logistic regression. Similarly, BERT achieved the highest Eigenfactor score tertile classification accuracy of 73.6%, followed by an accuracy of 71.8% for XGBoost and 65.3% for logistic regression. CONCLUSIONS: Open-source artificial intelligence can predict the impact factor and Eigenfactor score of accepting peer-reviewed journals. Further studies are required to examine the effect on publication success and the time-to-publication of such recommender systems.
format Online
Article
Text
id pubmed-10031443
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-100314432023-03-23 Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis Macri, Carmelo Bacchi, Stephen Teoh, Sheng Chieh Lim, Wan Yin Lam, Lydia Patel, Sandy Slee, Mark Casson, Robert Chan, WengOnn J Med Internet Res Original Paper BACKGROUND: Strategies to improve the selection of appropriate target journals may reduce delays in disseminating research results. Machine learning is increasingly used in content-based recommender algorithms to guide journal submissions for academic articles. OBJECTIVE: We sought to evaluate the performance of open-source artificial intelligence to predict the impact factor or Eigenfactor score tertile using academic article abstracts. METHODS: PubMed-indexed articles published between 2016 and 2021 were identified with the Medical Subject Headings (MeSH) terms “ophthalmology,” “radiology,” and “neurology.” Journals, titles, abstracts, author lists, and MeSH terms were collected. Journal impact factor and Eigenfactor scores were sourced from the 2020 Clarivate Journal Citation Report. The journals included in the study were allocated percentile ranks based on impact factor and Eigenfactor scores, compared with other journals that released publications in the same year. All abstracts were preprocessed, which included the removal of the abstract structure, and combined with titles, authors, and MeSH terms as a single input. The input data underwent preprocessing with the inbuilt ktrain Bidirectional Encoder Representations from Transformers (BERT) preprocessing library before analysis with BERT. Before use for logistic regression and XGBoost models, the input data underwent punctuation removal, negation detection, stemming, and conversion into a term frequency-inverse document frequency array. Following this preprocessing, data were randomly split into training and testing data sets with a 3:1 train:test ratio. Models were developed to predict whether a given article would be published in a first, second, or third tertile journal (0-33rd centile, 34th-66th centile, or 67th-100th centile), as ranked either by impact factor or Eigenfactor score. BERT, XGBoost, and logistic regression models were developed on the training data set before evaluation on the hold-out test data set. The primary outcome was overall classification accuracy for the best-performing model in the prediction of accepting journal impact factor tertile. RESULTS: There were 10,813 articles from 382 unique journals. The median impact factor and Eigenfactor score were 2.117 (IQR 1.102-2.622) and 0.00247 (IQR 0.00105-0.03), respectively. The BERT model achieved the highest impact factor tertile classification accuracy of 75.0%, followed by an accuracy of 71.6% for XGBoost and 65.4% for logistic regression. Similarly, BERT achieved the highest Eigenfactor score tertile classification accuracy of 73.6%, followed by an accuracy of 71.8% for XGBoost and 65.3% for logistic regression. CONCLUSIONS: Open-source artificial intelligence can predict the impact factor and Eigenfactor score of accepting peer-reviewed journals. Further studies are required to examine the effect on publication success and the time-to-publication of such recommender systems. JMIR Publications 2023-03-07 /pmc/articles/PMC10031443/ /pubmed/36881455 http://dx.doi.org/10.2196/42789 Text en ©Carmelo Macri, Stephen Bacchi, Sheng Chieh Teoh, Wan Yin Lim, Lydia Lam, Sandy Patel, Mark Slee, Robert Casson, WengOnn Chan. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 07.03.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Macri, Carmelo
Bacchi, Stephen
Teoh, Sheng Chieh
Lim, Wan Yin
Lam, Lydia
Patel, Sandy
Slee, Mark
Casson, Robert
Chan, WengOnn
Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis
title Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis
title_full Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis
title_fullStr Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis
title_full_unstemmed Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis
title_short Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis
title_sort evaluating the ability of open-source artificial intelligence to predict accepting-journal impact factor and eigenfactor score using academic article abstracts: cross-sectional machine learning analysis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10031443/
https://www.ncbi.nlm.nih.gov/pubmed/36881455
http://dx.doi.org/10.2196/42789
work_keys_str_mv AT macricarmelo evaluatingtheabilityofopensourceartificialintelligencetopredictacceptingjournalimpactfactorandeigenfactorscoreusingacademicarticleabstractscrosssectionalmachinelearninganalysis
AT bacchistephen evaluatingtheabilityofopensourceartificialintelligencetopredictacceptingjournalimpactfactorandeigenfactorscoreusingacademicarticleabstractscrosssectionalmachinelearninganalysis
AT teohshengchieh evaluatingtheabilityofopensourceartificialintelligencetopredictacceptingjournalimpactfactorandeigenfactorscoreusingacademicarticleabstractscrosssectionalmachinelearninganalysis
AT limwanyin evaluatingtheabilityofopensourceartificialintelligencetopredictacceptingjournalimpactfactorandeigenfactorscoreusingacademicarticleabstractscrosssectionalmachinelearninganalysis
AT lamlydia evaluatingtheabilityofopensourceartificialintelligencetopredictacceptingjournalimpactfactorandeigenfactorscoreusingacademicarticleabstractscrosssectionalmachinelearninganalysis
AT patelsandy evaluatingtheabilityofopensourceartificialintelligencetopredictacceptingjournalimpactfactorandeigenfactorscoreusingacademicarticleabstractscrosssectionalmachinelearninganalysis
AT sleemark evaluatingtheabilityofopensourceartificialintelligencetopredictacceptingjournalimpactfactorandeigenfactorscoreusingacademicarticleabstractscrosssectionalmachinelearninganalysis
AT cassonrobert evaluatingtheabilityofopensourceartificialintelligencetopredictacceptingjournalimpactfactorandeigenfactorscoreusingacademicarticleabstractscrosssectionalmachinelearninganalysis
AT chanwengonn evaluatingtheabilityofopensourceartificialintelligencetopredictacceptingjournalimpactfactorandeigenfactorscoreusingacademicarticleabstractscrosssectionalmachinelearninganalysis