Cargando…

Wide-Open: Accelerating public data release by automating detection of overdue datasets

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Grechkin, Maxim, Poon, Hoifung, Howe, Bill
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Community Page
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464523/ https://www.ncbi.nlm.nih.gov/pubmed/28594819 http://dx.doi.org/10.1371/journal.pbio.2002477

_version_	1783242787669409792
author	Grechkin, Maxim Poon, Hoifung Howe, Bill
author_facet	Grechkin, Maxim Poon, Hoifung Howe, Bill
author_sort	Grechkin, Maxim
collection	PubMed
description	Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.
format	Online Article Text
id	pubmed-5464523
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-54645232017-06-22 Wide-Open: Accelerating public data release by automating detection of overdue datasets Grechkin, Maxim Poon, Hoifung Howe, Bill PLoS Biol Community Page Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week. Public Library of Science 2017-06-08 /pmc/articles/PMC5464523/ /pubmed/28594819 http://dx.doi.org/10.1371/journal.pbio.2002477 Text en © 2017 Grechkin et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Community Page Grechkin, Maxim Poon, Hoifung Howe, Bill Wide-Open: Accelerating public data release by automating detection of overdue datasets
title	Wide-Open: Accelerating public data release by automating detection of overdue datasets
title_full	Wide-Open: Accelerating public data release by automating detection of overdue datasets
title_fullStr	Wide-Open: Accelerating public data release by automating detection of overdue datasets
title_full_unstemmed	Wide-Open: Accelerating public data release by automating detection of overdue datasets
title_short	Wide-Open: Accelerating public data release by automating detection of overdue datasets
title_sort	wide-open: accelerating public data release by automating detection of overdue datasets
topic	Community Page
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464523/ https://www.ncbi.nlm.nih.gov/pubmed/28594819 http://dx.doi.org/10.1371/journal.pbio.2002477
work_keys_str_mv	AT grechkinmaxim wideopenacceleratingpublicdatareleasebyautomatingdetectionofoverduedatasets AT poonhoifung wideopenacceleratingpublicdatareleasebyautomatingdetectionofoverduedatasets AT howebill wideopenacceleratingpublicdatareleasebyautomatingdetectionofoverduedatasets

Wide-Open: Accelerating public data release by automating detection of overdue datasets

Ejemplares similares