Cargando…

Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis

MOTIVATION: Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse. RESULTS: We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Guocai, Ramírez, Juan Camilo, Deng, Nan, Qiu, Xing, Wu, Canglin, Zheng, W Jim, Wu, Hulin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6333964/
https://www.ncbi.nlm.nih.gov/pubmed/30649296
http://dx.doi.org/10.1093/database/bay145
_version_ 1783387645543448576
author Chen, Guocai
Ramírez, Juan Camilo
Deng, Nan
Qiu, Xing
Wu, Canglin
Zheng, W Jim
Wu, Hulin
author_facet Chen, Guocai
Ramírez, Juan Camilo
Deng, Nan
Qiu, Xing
Wu, Canglin
Zheng, W Jim
Wu, Hulin
author_sort Chen, Guocai
collection PubMed
description MOTIVATION: Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse. RESULTS: We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series’ metadata. These attributes are the number of time points tested in the experiment and the disease being investigated. ReGEO also makes series searchable by other attributes available in GEO, such as platform organism, experiment type, associated PubMed ID as well as general keywords in the study’s description. Our approach greatly expands the usability of GEO data, demonstrating a credible approach to improve the utility of vast amount of publicly available data in the era of Big Data research.
format Online
Article
Text
id pubmed-6333964
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63339642019-01-24 Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis Chen, Guocai Ramírez, Juan Camilo Deng, Nan Qiu, Xing Wu, Canglin Zheng, W Jim Wu, Hulin Database (Oxford) Database Tool MOTIVATION: Gene Expression Omnibus (GEO) and other publicly available data store their metadata in the format of unstructured English text, which is very difficult for automated reuse. RESULTS: We employed text mining techniques to analyze the metadata of GEO and developed Restructured GEO database (ReGEO). ReGEO reorganizes and categorizes GEO series and makes them searchable by two new attributes extracted automatically from each series’ metadata. These attributes are the number of time points tested in the experiment and the disease being investigated. ReGEO also makes series searchable by other attributes available in GEO, such as platform organism, experiment type, associated PubMed ID as well as general keywords in the study’s description. Our approach greatly expands the usability of GEO data, demonstrating a credible approach to improve the utility of vast amount of publicly available data in the era of Big Data research. Oxford University Press 2019-01-16 /pmc/articles/PMC6333964/ /pubmed/30649296 http://dx.doi.org/10.1093/database/bay145 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Tool
Chen, Guocai
Ramírez, Juan Camilo
Deng, Nan
Qiu, Xing
Wu, Canglin
Zheng, W Jim
Wu, Hulin
Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis
title Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis
title_full Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis
title_fullStr Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis
title_full_unstemmed Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis
title_short Restructured GEO: restructuring Gene Expression Omnibus metadata for genome dynamics analysis
title_sort restructured geo: restructuring gene expression omnibus metadata for genome dynamics analysis
topic Database Tool
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6333964/
https://www.ncbi.nlm.nih.gov/pubmed/30649296
http://dx.doi.org/10.1093/database/bay145
work_keys_str_mv AT chenguocai restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis
AT ramirezjuancamilo restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis
AT dengnan restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis
AT qiuxing restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis
AT wucanglin restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis
AT zhengwjim restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis
AT wuhulin restructuredgeorestructuringgeneexpressionomnibusmetadataforgenomedynamicsanalysis