Cargando…

Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers

BACKGROUND: The ability to locate publicly available gene expression microarray datasets effectively and efficiently facilitates the reuse of these potentially valuable resources. Centralized biomedical databases allow users to query dataset metadata descriptions, but these annotations are often too...

Descripción completa

Detalles Bibliográficos
Autores principales: Piwowar, Heather A, Chapman, Wendy W
Formato: Texto
Lenguaje:English
Publicado: University of Illinois at Chicago Library 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990274/
https://www.ncbi.nlm.nih.gov/pubmed/20349403
_version_ 1782192453234720768
author Piwowar, Heather A
Chapman, Wendy W
author_facet Piwowar, Heather A
Chapman, Wendy W
author_sort Piwowar, Heather A
collection PubMed
description BACKGROUND: The ability to locate publicly available gene expression microarray datasets effectively and efficiently facilitates the reuse of these potentially valuable resources. Centralized biomedical databases allow users to query dataset metadata descriptions, but these annotations are often too sparse and diverse to allow complex and accurate queries. In this study we examined the ability of PubMed article identifiers to locate publicly available gene expression microarray datasets, and investigated whether the retrieved datasets were representative of publicly available datasets found through statements of data sharing in the associated research articles. RESULTS: In a recent article, Ochsner and colleagues identified 397 studies that had generated gene expression microarray data. Their search of the full text of each publication for statements of data sharing revealed 203 publicly available datasets, including 179 in the Gene Expression Omnibus (GEO) or ArrayExpress databases. Our scripted search of GEO and ArrayExpress for PubMed identifiers of the same 397 studies returned 160 datasets, including six not found by the original search for data sharing statements. As a proportion of datasets found by either method, the search for data sharing statements identified 91.4% of the 209 publicly available datasets, compared to 76.6% found by our search for PubMed identifiers. Searching GEO or ArrayExpress alone retrieved 63.2% and 46.9% of all available datasets, respectively. Studies retrieved through PubMed identifiers were representative of all datasets in terms of research theme, technology, size, and impact, though the recall was highest for datasets published by the highest-impact journals. CONCLUSIONS: Searching database entries using PubMed identifiers can identify the majority of publicly available datasets. We urge authors of all datasets to complete the citation fields for their dataset submissions once publication details are known, thereby ensuring their work has maximum visibility and can contribute to subsequent studies.
format Text
id pubmed-2990274
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher University of Illinois at Chicago Library
record_format MEDLINE/PubMed
spelling pubmed-29902742010-11-29 Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers Piwowar, Heather A Chapman, Wendy W J Biomed Discov Collab Research BACKGROUND: The ability to locate publicly available gene expression microarray datasets effectively and efficiently facilitates the reuse of these potentially valuable resources. Centralized biomedical databases allow users to query dataset metadata descriptions, but these annotations are often too sparse and diverse to allow complex and accurate queries. In this study we examined the ability of PubMed article identifiers to locate publicly available gene expression microarray datasets, and investigated whether the retrieved datasets were representative of publicly available datasets found through statements of data sharing in the associated research articles. RESULTS: In a recent article, Ochsner and colleagues identified 397 studies that had generated gene expression microarray data. Their search of the full text of each publication for statements of data sharing revealed 203 publicly available datasets, including 179 in the Gene Expression Omnibus (GEO) or ArrayExpress databases. Our scripted search of GEO and ArrayExpress for PubMed identifiers of the same 397 studies returned 160 datasets, including six not found by the original search for data sharing statements. As a proportion of datasets found by either method, the search for data sharing statements identified 91.4% of the 209 publicly available datasets, compared to 76.6% found by our search for PubMed identifiers. Searching GEO or ArrayExpress alone retrieved 63.2% and 46.9% of all available datasets, respectively. Studies retrieved through PubMed identifiers were representative of all datasets in terms of research theme, technology, size, and impact, though the recall was highest for datasets published by the highest-impact journals. CONCLUSIONS: Searching database entries using PubMed identifiers can identify the majority of publicly available datasets. We urge authors of all datasets to complete the citation fields for their dataset submissions once publication details are known, thereby ensuring their work has maximum visibility and can contribute to subsequent studies. University of Illinois at Chicago Library 2010-03-28 /pmc/articles/PMC2990274/ /pubmed/20349403 Text en http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Piwowar, Heather A
Chapman, Wendy W
Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers
title Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers
title_full Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers
title_fullStr Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers
title_full_unstemmed Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers
title_short Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers
title_sort recall and bias of retrieving gene expression microarray datasets through pubmed identifiers
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990274/
https://www.ncbi.nlm.nih.gov/pubmed/20349403
work_keys_str_mv AT piwowarheathera recallandbiasofretrievinggeneexpressionmicroarraydatasetsthroughpubmedidentifiers
AT chapmanwendyw recallandbiasofretrievinggeneexpressionmicroarraydatasetsthroughpubmedidentifiers