Cargando…
Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis
MOTIVATION: Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse. RESULTS: We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6333964/ https://www.ncbi.nlm.nih.gov/pubmed/30649296 http://dx.doi.org/10.1093/database/bay145 |
_version_ | 1783387645543448576 |
---|---|
author | Chen, Guocai Ramírez, Juan Camilo Deng, Nan Qiu, Xing Wu, Canglin Zheng, W Jim Wu, Hulin |
author_facet | Chen, Guocai Ramírez, Juan Camilo Deng, Nan Qiu, Xing Wu, Canglin Zheng, W Jim Wu, Hulin |
author_sort | Chen, Guocai |
collection | PubMed |
description | MOTIVATION: Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse. RESULTS: We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series’ metadata. These attributes are the number of time points tested in the experiment and the disease being investigated. ReGEO also makes series searchable by other attributes available in GEO, such as platform organism, experiment type, associated PubMed ID as well as general keywords in the study’s description. Our approach greatly expands the usability of GEO data, demonstrating a credible approach to improve the utility of vast amount of publicly available data in the era of Big Data research. |
format | Online Article Text |
id | pubmed-6333964 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-63339642019-01-24 Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis Chen, Guocai Ramírez, Juan Camilo Deng, Nan Qiu, Xing Wu, Canglin Zheng, W Jim Wu, Hulin Database (Oxford) Database Tool MOTIVATION: Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse. RESULTS: We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series’ metadata. These attributes are the number of time points tested in the experiment and the disease being investigated. ReGEO also makes series searchable by other attributes available in GEO, such as platform organism, experiment type, associated PubMed ID as well as general keywords in the study’s description. Our approach greatly expands the usability of GEO data, demonstrating a credible approach to improve the utility of vast amount of publicly available data in the era of Big Data research. Oxford University Press 2019-01-16 /pmc/articles/PMC6333964/ /pubmed/30649296 http://dx.doi.org/10.1093/database/bay145 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Database Tool Chen, Guocai Ramírez, Juan Camilo Deng, Nan Qiu, Xing Wu, Canglin Zheng, W Jim Wu, Hulin Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis |
title | Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis |
title_full | Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis |
title_fullStr | Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis |
title_full_unstemmed | Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis |
title_short | Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis |
title_sort | restructured geo: restructuring gene expression omnibus metadata for genome dynamics analysis |
topic | Database Tool |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6333964/ https://www.ncbi.nlm.nih.gov/pubmed/30649296 http://dx.doi.org/10.1093/database/bay145 |
work_keys_str_mv | AT chenguocai restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis AT ramirezjuancamilo restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis AT dengnan restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis AT qiuxing restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis AT wucanglin restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis AT zhengwjim restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis AT wuhulin restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis |