Cargando…

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

OBJECTIVE: This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that ar...

Descripción completa

Detalles Bibliográficos
Autores principales:	Read, Kevin B., Sheehan, Jerry R., Huerta, Michael F., Knecht, Lou S., Mork, James G., Humphreys, Betsy L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4514623/ https://www.ncbi.nlm.nih.gov/pubmed/26207759 http://dx.doi.org/10.1371/journal.pone.0132735

_version_	1782382782705565696
author	Read, Kevin B. Sheehan, Jerry R. Huerta, Michael F. Knecht, Lou S. Mork, James G. Humphreys, Betsy L.
author_facet	Read, Kevin B. Sheehan, Jerry R. Huerta, Michael F. Knecht, Lou S. Mork, James G. Humphreys, Betsy L.
author_sort	Read, Kevin B.
collection	PubMed
description	OBJECTIVE: This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository. METHODS: We analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article. RESULTS: About 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects. CONCLUSION: In addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets.
format	Online Article Text
id	pubmed-4514623
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-45146232015-07-29 Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study Read, Kevin B. Sheehan, Jerry R. Huerta, Michael F. Knecht, Lou S. Mork, James G. Humphreys, Betsy L. PLoS One Research Article OBJECTIVE: This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository. METHODS: We analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article. RESULTS: About 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects. CONCLUSION: In addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets. Public Library of Science 2015-07-24 /pmc/articles/PMC4514623/ /pubmed/26207759 http://dx.doi.org/10.1371/journal.pone.0132735 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle	Research Article Read, Kevin B. Sheehan, Jerry R. Huerta, Michael F. Knecht, Lou S. Mork, James G. Humphreys, Betsy L. Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study
title	Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study
title_full	Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study
title_fullStr	Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study
title_full_unstemmed	Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study
title_short	Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study
title_sort	sizing the problem of improving discovery and access to nih-funded data: a preliminary study
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4514623/ https://www.ncbi.nlm.nih.gov/pubmed/26207759 http://dx.doi.org/10.1371/journal.pone.0132735
work_keys_str_mv	AT readkevinb sizingtheproblemofimprovingdiscoveryandaccesstonihfundeddataapreliminarystudy AT sheehanjerryr sizingtheproblemofimprovingdiscoveryandaccesstonihfundeddataapreliminarystudy AT huertamichaelf sizingtheproblemofimprovingdiscoveryandaccesstonihfundeddataapreliminarystudy AT knechtlous sizingtheproblemofimprovingdiscoveryandaccesstonihfundeddataapreliminarystudy AT morkjamesg sizingtheproblemofimprovingdiscoveryandaccesstonihfundeddataapreliminarystudy AT humphreysbetsyl sizingtheproblemofimprovingdiscoveryandaccesstonihfundeddataapreliminarystudy AT sizingtheproblemofimprovingdiscoveryandaccesstonihfundeddataapreliminarystudy

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

Ejemplares similares