Cargando…

Precision annotation of digital samples in NCBI’s gene expression omnibus

The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce...

Descripción completa

Detalles Bibliográficos
Autores principales: Hadley, Dexter, Pan, James, El-Sayed, Osama, Aljabban, Jihad, Aljabban, Imad, Azad, Tej D., Hadied, Mohamad O., Raza, Shuaib, Rayikanti, Benjamin Abhishek, Chen, Bin, Paik, Hyojung, Aran, Dvir, Spatz, Jordan, Himmelstein, Daniel, Panahiazar, Maryam, Bhattacharya, Sanchita, Sirota, Marina, Musen, Mark A., Butte, Atul J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5604135/
https://www.ncbi.nlm.nih.gov/pubmed/28925997
http://dx.doi.org/10.1038/sdata.2017.125
_version_ 1783264816034480128
author Hadley, Dexter
Pan, James
El-Sayed, Osama
Aljabban, Jihad
Aljabban, Imad
Azad, Tej D.
Hadied, Mohamad O.
Raza, Shuaib
Rayikanti, Benjamin Abhishek
Chen, Bin
Paik, Hyojung
Aran, Dvir
Spatz, Jordan
Himmelstein, Daniel
Panahiazar, Maryam
Bhattacharya, Sanchita
Sirota, Marina
Musen, Mark A.
Butte, Atul J.
author_facet Hadley, Dexter
Pan, James
El-Sayed, Osama
Aljabban, Jihad
Aljabban, Imad
Azad, Tej D.
Hadied, Mohamad O.
Raza, Shuaib
Rayikanti, Benjamin Abhishek
Chen, Bin
Paik, Hyojung
Aran, Dvir
Spatz, Jordan
Himmelstein, Daniel
Panahiazar, Maryam
Bhattacharya, Sanchita
Sirota, Marina
Musen, Mark A.
Butte, Atul J.
author_sort Hadley, Dexter
collection PubMed
description The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open ‘big data’ under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine.
format Online
Article
Text
id pubmed-5604135
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-56041352017-09-28 Precision annotation of digital samples in NCBI’s gene expression omnibus Hadley, Dexter Pan, James El-Sayed, Osama Aljabban, Jihad Aljabban, Imad Azad, Tej D. Hadied, Mohamad O. Raza, Shuaib Rayikanti, Benjamin Abhishek Chen, Bin Paik, Hyojung Aran, Dvir Spatz, Jordan Himmelstein, Daniel Panahiazar, Maryam Bhattacharya, Sanchita Sirota, Marina Musen, Mark A. Butte, Atul J. Sci Data Article The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open ‘big data’ under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine. Nature Publishing Group 2017-09-19 /pmc/articles/PMC5604135/ /pubmed/28925997 http://dx.doi.org/10.1038/sdata.2017.125 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Hadley, Dexter
Pan, James
El-Sayed, Osama
Aljabban, Jihad
Aljabban, Imad
Azad, Tej D.
Hadied, Mohamad O.
Raza, Shuaib
Rayikanti, Benjamin Abhishek
Chen, Bin
Paik, Hyojung
Aran, Dvir
Spatz, Jordan
Himmelstein, Daniel
Panahiazar, Maryam
Bhattacharya, Sanchita
Sirota, Marina
Musen, Mark A.
Butte, Atul J.
Precision annotation of digital samples in NCBI’s gene expression omnibus
title Precision annotation of digital samples in NCBI’s gene expression omnibus
title_full Precision annotation of digital samples in NCBI’s gene expression omnibus
title_fullStr Precision annotation of digital samples in NCBI’s gene expression omnibus
title_full_unstemmed Precision annotation of digital samples in NCBI’s gene expression omnibus
title_short Precision annotation of digital samples in NCBI’s gene expression omnibus
title_sort precision annotation of digital samples in ncbi’s gene expression omnibus
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5604135/
https://www.ncbi.nlm.nih.gov/pubmed/28925997
http://dx.doi.org/10.1038/sdata.2017.125
work_keys_str_mv AT hadleydexter precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT panjames precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT elsayedosama precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT aljabbanjihad precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT aljabbanimad precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT azadtejd precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT hadiedmohamado precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT razashuaib precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT rayikantibenjaminabhishek precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT chenbin precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT paikhyojung precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT arandvir precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT spatzjordan precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT himmelsteindaniel precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT panahiazarmaryam precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT bhattacharyasanchita precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT sirotamarina precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT musenmarka precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus
AT butteatulj precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus