Cargando…
Precision annotation of digital samples in NCBI’s gene expression omnibus
The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce...
Autores principales: | , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5604135/ https://www.ncbi.nlm.nih.gov/pubmed/28925997 http://dx.doi.org/10.1038/sdata.2017.125 |
_version_ | 1783264816034480128 |
---|---|
author | Hadley, Dexter Pan, James El-Sayed, Osama Aljabban, Jihad Aljabban, Imad Azad, Tej D. Hadied, Mohamad O. Raza, Shuaib Rayikanti, Benjamin Abhishek Chen, Bin Paik, Hyojung Aran, Dvir Spatz, Jordan Himmelstein, Daniel Panahiazar, Maryam Bhattacharya, Sanchita Sirota, Marina Musen, Mark A. Butte, Atul J. |
author_facet | Hadley, Dexter Pan, James El-Sayed, Osama Aljabban, Jihad Aljabban, Imad Azad, Tej D. Hadied, Mohamad O. Raza, Shuaib Rayikanti, Benjamin Abhishek Chen, Bin Paik, Hyojung Aran, Dvir Spatz, Jordan Himmelstein, Daniel Panahiazar, Maryam Bhattacharya, Sanchita Sirota, Marina Musen, Mark A. Butte, Atul J. |
author_sort | Hadley, Dexter |
collection | PubMed |
description | The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open ‘big data’ under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine. |
format | Online Article Text |
id | pubmed-5604135 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-56041352017-09-28 Precision annotation of digital samples in NCBI’s gene expression omnibus Hadley, Dexter Pan, James El-Sayed, Osama Aljabban, Jihad Aljabban, Imad Azad, Tej D. Hadied, Mohamad O. Raza, Shuaib Rayikanti, Benjamin Abhishek Chen, Bin Paik, Hyojung Aran, Dvir Spatz, Jordan Himmelstein, Daniel Panahiazar, Maryam Bhattacharya, Sanchita Sirota, Marina Musen, Mark A. Butte, Atul J. Sci Data Article The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open ‘big data’ under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine. Nature Publishing Group 2017-09-19 /pmc/articles/PMC5604135/ /pubmed/28925997 http://dx.doi.org/10.1038/sdata.2017.125 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Hadley, Dexter Pan, James El-Sayed, Osama Aljabban, Jihad Aljabban, Imad Azad, Tej D. Hadied, Mohamad O. Raza, Shuaib Rayikanti, Benjamin Abhishek Chen, Bin Paik, Hyojung Aran, Dvir Spatz, Jordan Himmelstein, Daniel Panahiazar, Maryam Bhattacharya, Sanchita Sirota, Marina Musen, Mark A. Butte, Atul J. Precision annotation of digital samples in NCBI’s gene expression omnibus |
title | Precision annotation of digital samples in NCBI’s gene expression omnibus |
title_full | Precision annotation of digital samples in NCBI’s gene expression omnibus |
title_fullStr | Precision annotation of digital samples in NCBI’s gene expression omnibus |
title_full_unstemmed | Precision annotation of digital samples in NCBI’s gene expression omnibus |
title_short | Precision annotation of digital samples in NCBI’s gene expression omnibus |
title_sort | precision annotation of digital samples in ncbi’s gene expression omnibus |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5604135/ https://www.ncbi.nlm.nih.gov/pubmed/28925997 http://dx.doi.org/10.1038/sdata.2017.125 |
work_keys_str_mv | AT hadleydexter precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT panjames precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT elsayedosama precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT aljabbanjihad precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT aljabbanimad precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT azadtejd precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT hadiedmohamado precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT razashuaib precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT rayikantibenjaminabhishek precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT chenbin precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT paikhyojung precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT arandvir precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT spatzjordan precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT himmelsteindaniel precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT panahiazarmaryam precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT bhattacharyasanchita precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT sirotamarina precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT musenmarka precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus AT butteatulj precisionannotationofdigitalsamplesinncbisgeneexpressionomnibus |