Cargando…

Examining linguistic shifts between preprints and publications

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A m...

Descripción completa

Detalles Bibliográficos
Autores principales: Nicholson, David N., Rubinetti, Vincent, Hu, Dongbo, Thielk, Marvin, Hunter, Lawrence E., Greene, Casey S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8806061/
https://www.ncbi.nlm.nih.gov/pubmed/35104289
http://dx.doi.org/10.1371/journal.pbio.3001470
_version_ 1784643365048942592
author Nicholson, David N.
Rubinetti, Vincent
Hu, Dongbo
Thielk, Marvin
Hunter, Lawrence E.
Greene, Casey S.
author_facet Nicholson, David N.
Rubinetti, Vincent
Hu, Dongbo
Thielk, Marvin
Hunter, Lawrence E.
Greene, Casey S.
author_sort Nicholson, David N.
collection PubMed
description Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint–peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.
format Online
Article
Text
id pubmed-8806061
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-88060612022-02-02 Examining linguistic shifts between preprints and publications Nicholson, David N. Rubinetti, Vincent Hu, Dongbo Thielk, Marvin Hunter, Lawrence E. Greene, Casey S. PLoS Biol Meta-Research Article Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint–peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape. Public Library of Science 2022-02-01 /pmc/articles/PMC8806061/ /pubmed/35104289 http://dx.doi.org/10.1371/journal.pbio.3001470 Text en © 2022 Nicholson et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Meta-Research Article
Nicholson, David N.
Rubinetti, Vincent
Hu, Dongbo
Thielk, Marvin
Hunter, Lawrence E.
Greene, Casey S.
Examining linguistic shifts between preprints and publications
title Examining linguistic shifts between preprints and publications
title_full Examining linguistic shifts between preprints and publications
title_fullStr Examining linguistic shifts between preprints and publications
title_full_unstemmed Examining linguistic shifts between preprints and publications
title_short Examining linguistic shifts between preprints and publications
title_sort examining linguistic shifts between preprints and publications
topic Meta-Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8806061/
https://www.ncbi.nlm.nih.gov/pubmed/35104289
http://dx.doi.org/10.1371/journal.pbio.3001470
work_keys_str_mv AT nicholsondavidn examininglinguisticshiftsbetweenpreprintsandpublications
AT rubinettivincent examininglinguisticshiftsbetweenpreprintsandpublications
AT hudongbo examininglinguisticshiftsbetweenpreprintsandpublications
AT thielkmarvin examininglinguisticshiftsbetweenpreprintsandpublications
AT hunterlawrencee examininglinguisticshiftsbetweenpreprintsandpublications
AT greenecaseys examininglinguisticshiftsbetweenpreprintsandpublications