Cargando…

Is useful research data usually shared? An investigation of genome-wide association study summary statistics

Primary data collected during a research study is often shared and may be reused for new studies. To assess the extent of data sharing in favourable circumstances and whether data sharing checks can be automated, this article investigates summary statistics from primary human genome-wide association...

Descripción completa

Detalles Bibliográficos
Autores principales: Thelwall, Mike, Munafò, Marcus, Mas-Bleda, Amalia, Stuart, Emma, Makita, Meiko, Weigert, Verena, Keene, Chris, Khan, Nushrat, Drax, Katie, Kousha, Kayvan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7034915/
https://www.ncbi.nlm.nih.gov/pubmed/32084240
http://dx.doi.org/10.1371/journal.pone.0229578
_version_ 1783499969651539968
author Thelwall, Mike
Munafò, Marcus
Mas-Bleda, Amalia
Stuart, Emma
Makita, Meiko
Weigert, Verena
Keene, Chris
Khan, Nushrat
Drax, Katie
Kousha, Kayvan
author_facet Thelwall, Mike
Munafò, Marcus
Mas-Bleda, Amalia
Stuart, Emma
Makita, Meiko
Weigert, Verena
Keene, Chris
Khan, Nushrat
Drax, Katie
Kousha, Kayvan
author_sort Thelwall, Mike
collection PubMed
description Primary data collected during a research study is often shared and may be reused for new studies. To assess the extent of data sharing in favourable circumstances and whether data sharing checks can be automated, this article investigates summary statistics from primary human genome-wide association studies (GWAS). This type of data is highly suitable for sharing because it is a standard research output, is straightforward to use in future studies (e.g., for secondary analysis), and may be already stored in a standard format for internal sharing within multi-site research projects. Manual checks of 1799 articles from 2010 and 2017 matching a simple PubMed query for molecular epidemiology GWAS were used to identify 314 primary human GWAS papers. Of these, only 13% reported the location of a complete set of GWAS summary data, increasing from 3% in 2010 to 23% in 2017. Whilst information about whether data was shared was typically located clearly within a data availability statement, the exact nature of the shared data was usually unspecified. Thus, data sharing is the exception even in suitable research fields with relatively strong data sharing norms. Moreover, the lack of clear data descriptions within data sharing statements greatly complicates the task of automatically characterising shared data sets.
format Online
Article
Text
id pubmed-7034915
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-70349152020-02-27 Is useful research data usually shared? An investigation of genome-wide association study summary statistics Thelwall, Mike Munafò, Marcus Mas-Bleda, Amalia Stuart, Emma Makita, Meiko Weigert, Verena Keene, Chris Khan, Nushrat Drax, Katie Kousha, Kayvan PLoS One Research Article Primary data collected during a research study is often shared and may be reused for new studies. To assess the extent of data sharing in favourable circumstances and whether data sharing checks can be automated, this article investigates summary statistics from primary human genome-wide association studies (GWAS). This type of data is highly suitable for sharing because it is a standard research output, is straightforward to use in future studies (e.g., for secondary analysis), and may be already stored in a standard format for internal sharing within multi-site research projects. Manual checks of 1799 articles from 2010 and 2017 matching a simple PubMed query for molecular epidemiology GWAS were used to identify 314 primary human GWAS papers. Of these, only 13% reported the location of a complete set of GWAS summary data, increasing from 3% in 2010 to 23% in 2017. Whilst information about whether data was shared was typically located clearly within a data availability statement, the exact nature of the shared data was usually unspecified. Thus, data sharing is the exception even in suitable research fields with relatively strong data sharing norms. Moreover, the lack of clear data descriptions within data sharing statements greatly complicates the task of automatically characterising shared data sets. Public Library of Science 2020-02-21 /pmc/articles/PMC7034915/ /pubmed/32084240 http://dx.doi.org/10.1371/journal.pone.0229578 Text en © 2020 Thelwall et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Thelwall, Mike
Munafò, Marcus
Mas-Bleda, Amalia
Stuart, Emma
Makita, Meiko
Weigert, Verena
Keene, Chris
Khan, Nushrat
Drax, Katie
Kousha, Kayvan
Is useful research data usually shared? An investigation of genome-wide association study summary statistics
title Is useful research data usually shared? An investigation of genome-wide association study summary statistics
title_full Is useful research data usually shared? An investigation of genome-wide association study summary statistics
title_fullStr Is useful research data usually shared? An investigation of genome-wide association study summary statistics
title_full_unstemmed Is useful research data usually shared? An investigation of genome-wide association study summary statistics
title_short Is useful research data usually shared? An investigation of genome-wide association study summary statistics
title_sort is useful research data usually shared? an investigation of genome-wide association study summary statistics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7034915/
https://www.ncbi.nlm.nih.gov/pubmed/32084240
http://dx.doi.org/10.1371/journal.pone.0229578
work_keys_str_mv AT thelwallmike isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics
AT munafomarcus isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics
AT masbledaamalia isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics
AT stuartemma isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics
AT makitameiko isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics
AT weigertverena isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics
AT keenechris isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics
AT khannushrat isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics
AT draxkatie isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics
AT koushakayvan isusefulresearchdatausuallysharedaninvestigationofgenomewideassociationstudysummarystatistics