Cargando…

Understanding progress in software citation: a study of software citation in the CORD-19 corpus

In this paper, we investigate progress toward improved software citation by examining current software citation practices. We first introduce our machine learning based data pipeline that extracts software mentions from the CORD-19 corpus, a regularly updated collection of more than 280,000 scholarl...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Caifan, Cohoon, Johanna, Lopez, Patrice, Howison, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9454791/
https://www.ncbi.nlm.nih.gov/pubmed/36091992
http://dx.doi.org/10.7717/peerj-cs.1022
_version_ 1784785434790854656
author Du, Caifan
Cohoon, Johanna
Lopez, Patrice
Howison, James
author_facet Du, Caifan
Cohoon, Johanna
Lopez, Patrice
Howison, James
author_sort Du, Caifan
collection PubMed
description In this paper, we investigate progress toward improved software citation by examining current software citation practices. We first introduce our machine learning based data pipeline that extracts software mentions from the CORD-19 corpus, a regularly updated collection of more than 280,000 scholarly articles on COVID-19 and related historical coronaviruses. We then closely examine a stratified sample of extracted software mentions from recent CORD-19 publications to understand the status of software citation. We also searched online for the mentioned software projects and their citation requests. We evaluate both practices of referencing software in publications and making software citable in comparison with earlier findings and recent advocacy recommendations. We found increased mentions of software versions, increased open source practices, and improved software accessibility. Yet, we also found a continuation of high numbers of informal mentions that did not sufficiently credit software authors. Existing software citation requests were diverse but did not match with software citation advocacy recommendations nor were they frequently followed by researchers authoring papers. Finally, we discuss implications for software citation advocacy and standard making efforts seeking to improve the situation. Our results show the diversity of software citation practices and how they differ from advocacy recommendations, provide a baseline for assessing the progress of software citation implementation, and enrich the understanding of existing challenges.
format Online
Article
Text
id pubmed-9454791
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-94547912022-09-09 Understanding progress in software citation: a study of software citation in the CORD-19 corpus Du, Caifan Cohoon, Johanna Lopez, Patrice Howison, James PeerJ Comput Sci Data Science In this paper, we investigate progress toward improved software citation by examining current software citation practices. We first introduce our machine learning based data pipeline that extracts software mentions from the CORD-19 corpus, a regularly updated collection of more than 280,000 scholarly articles on COVID-19 and related historical coronaviruses. We then closely examine a stratified sample of extracted software mentions from recent CORD-19 publications to understand the status of software citation. We also searched online for the mentioned software projects and their citation requests. We evaluate both practices of referencing software in publications and making software citable in comparison with earlier findings and recent advocacy recommendations. We found increased mentions of software versions, increased open source practices, and improved software accessibility. Yet, we also found a continuation of high numbers of informal mentions that did not sufficiently credit software authors. Existing software citation requests were diverse but did not match with software citation advocacy recommendations nor were they frequently followed by researchers authoring papers. Finally, we discuss implications for software citation advocacy and standard making efforts seeking to improve the situation. Our results show the diversity of software citation practices and how they differ from advocacy recommendations, provide a baseline for assessing the progress of software citation implementation, and enrich the understanding of existing challenges. PeerJ Inc. 2022-07-25 /pmc/articles/PMC9454791/ /pubmed/36091992 http://dx.doi.org/10.7717/peerj-cs.1022 Text en ©2022 Du et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Science
Du, Caifan
Cohoon, Johanna
Lopez, Patrice
Howison, James
Understanding progress in software citation: a study of software citation in the CORD-19 corpus
title Understanding progress in software citation: a study of software citation in the CORD-19 corpus
title_full Understanding progress in software citation: a study of software citation in the CORD-19 corpus
title_fullStr Understanding progress in software citation: a study of software citation in the CORD-19 corpus
title_full_unstemmed Understanding progress in software citation: a study of software citation in the CORD-19 corpus
title_short Understanding progress in software citation: a study of software citation in the CORD-19 corpus
title_sort understanding progress in software citation: a study of software citation in the cord-19 corpus
topic Data Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9454791/
https://www.ncbi.nlm.nih.gov/pubmed/36091992
http://dx.doi.org/10.7717/peerj-cs.1022
work_keys_str_mv AT ducaifan understandingprogressinsoftwarecitationastudyofsoftwarecitationinthecord19corpus
AT cohoonjohanna understandingprogressinsoftwarecitationastudyofsoftwarecitationinthecord19corpus
AT lopezpatrice understandingprogressinsoftwarecitationastudyofsoftwarecitationinthecord19corpus
AT howisonjames understandingprogressinsoftwarecitationastudyofsoftwarecitationinthecord19corpus