Cargando…

Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content

Increasingly, scholarly articles contain URI references to “web at large” resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they report on. A reader who...

Descripción completa

Detalles Bibliográficos
Autores principales: Jones, Shawn M., Van de Sompel, Herbert, Shankar, Harihar, Klein, Martin, Tobin, Richard, Grover, Claire
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5135130/
https://www.ncbi.nlm.nih.gov/pubmed/27911955
http://dx.doi.org/10.1371/journal.pone.0167475
_version_ 1782471573359296512
author Jones, Shawn M.
Van de Sompel, Herbert
Shankar, Harihar
Klein, Martin
Tobin, Richard
Grover, Claire
author_facet Jones, Shawn M.
Van de Sompel, Herbert
Shankar, Harihar
Klein, Martin
Tobin, Richard
Grover, Claire
author_sort Jones, Shawn M.
collection PubMed
description Increasingly, scholarly articles contain URI references to “web at large” resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they report on. A reader who visits a web at large resource by following a URI reference in an article, some time after its publication, is led to believe that the resource’s content is representative of what the author originally referenced. However, due to the dynamic nature of the web, that may very well not be the case. We reuse a dataset from a previous study in which several authors of this paper were involved, and investigate to what extent the textual content of web at large resources referenced in a vast collection of Science, Technology, and Medicine (STM) articles published between 1997 and 2012 has remained stable since the publication of the referencing article. We do so in a two-step approach that relies on various well-established similarity measures to compare textual content. In a first step, we use 19 web archives to find snapshots of referenced web at large resources that have textual content that is representative of the state of the resource around the time of publication of the referencing paper. We find that representative snapshots exist for about 30% of all URI references. In a second step, we compare the textual content of representative snapshots with that of their live web counterparts. We find that for over 75% of references the content has drifted away from what it was when referenced. These results raise significant concerns regarding the long term integrity of the web-based scholarly record and call for the deployment of techniques to combat these problems.
format Online
Article
Text
id pubmed-5135130
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-51351302016-12-21 Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content Jones, Shawn M. Van de Sompel, Herbert Shankar, Harihar Klein, Martin Tobin, Richard Grover, Claire PLoS One Research Article Increasingly, scholarly articles contain URI references to “web at large” resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they report on. A reader who visits a web at large resource by following a URI reference in an article, some time after its publication, is led to believe that the resource’s content is representative of what the author originally referenced. However, due to the dynamic nature of the web, that may very well not be the case. We reuse a dataset from a previous study in which several authors of this paper were involved, and investigate to what extent the textual content of web at large resources referenced in a vast collection of Science, Technology, and Medicine (STM) articles published between 1997 and 2012 has remained stable since the publication of the referencing article. We do so in a two-step approach that relies on various well-established similarity measures to compare textual content. In a first step, we use 19 web archives to find snapshots of referenced web at large resources that have textual content that is representative of the state of the resource around the time of publication of the referencing paper. We find that representative snapshots exist for about 30% of all URI references. In a second step, we compare the textual content of representative snapshots with that of their live web counterparts. We find that for over 75% of references the content has drifted away from what it was when referenced. These results raise significant concerns regarding the long term integrity of the web-based scholarly record and call for the deployment of techniques to combat these problems. Public Library of Science 2016-12-02 /pmc/articles/PMC5135130/ /pubmed/27911955 http://dx.doi.org/10.1371/journal.pone.0167475 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Jones, Shawn M.
Van de Sompel, Herbert
Shankar, Harihar
Klein, Martin
Tobin, Richard
Grover, Claire
Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content
title Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content
title_full Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content
title_fullStr Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content
title_full_unstemmed Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content
title_short Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content
title_sort scholarly context adrift: three out of four uri references lead to changed content
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5135130/
https://www.ncbi.nlm.nih.gov/pubmed/27911955
http://dx.doi.org/10.1371/journal.pone.0167475
work_keys_str_mv AT jonesshawnm scholarlycontextadriftthreeoutoffoururireferencesleadtochangedcontent
AT vandesompelherbert scholarlycontextadriftthreeoutoffoururireferencesleadtochangedcontent
AT shankarharihar scholarlycontextadriftthreeoutoffoururireferencesleadtochangedcontent
AT kleinmartin scholarlycontextadriftthreeoutoffoururireferencesleadtochangedcontent
AT tobinrichard scholarlycontextadriftthreeoutoffoururireferencesleadtochangedcontent
AT groverclaire scholarlycontextadriftthreeoutoffoururireferencesleadtochangedcontent