Cargando…

Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

BACKGROUND: The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potenti...

Descripción completa

Detalles Bibliográficos
Autores principales: Kupershmidt, Ilya, Su, Qiaojuan Jane, Grewal, Anoop, Sundaresh, Suman, Halperin, Inbal, Flynn, James, Shekar, Mamatha, Wang, Helen, Park, Jenny, Cui, Wenwu, Wall, Gregory D., Wisotzkey, Robert, Alag, Satnam, Akhtari, Saeid, Ronaghi, Mostafa
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947508/
https://www.ncbi.nlm.nih.gov/pubmed/20927376
http://dx.doi.org/10.1371/journal.pone.0013066
_version_ 1782187379875905536
author Kupershmidt, Ilya
Su, Qiaojuan Jane
Grewal, Anoop
Sundaresh, Suman
Halperin, Inbal
Flynn, James
Shekar, Mamatha
Wang, Helen
Park, Jenny
Cui, Wenwu
Wall, Gregory D.
Wisotzkey, Robert
Alag, Satnam
Akhtari, Saeid
Ronaghi, Mostafa
author_facet Kupershmidt, Ilya
Su, Qiaojuan Jane
Grewal, Anoop
Sundaresh, Suman
Halperin, Inbal
Flynn, James
Shekar, Mamatha
Wang, Helen
Park, Jenny
Cui, Wenwu
Wall, Gregory D.
Wisotzkey, Robert
Alag, Satnam
Akhtari, Saeid
Ronaghi, Mostafa
author_sort Kupershmidt, Ilya
collection PubMed
description BACKGROUND: The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. METHODOLOGY/RESULTS: We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. CONCLUSIONS: Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.
format Text
id pubmed-2947508
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29475082010-10-06 Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data Kupershmidt, Ilya Su, Qiaojuan Jane Grewal, Anoop Sundaresh, Suman Halperin, Inbal Flynn, James Shekar, Mamatha Wang, Helen Park, Jenny Cui, Wenwu Wall, Gregory D. Wisotzkey, Robert Alag, Satnam Akhtari, Saeid Ronaghi, Mostafa PLoS One Research Article BACKGROUND: The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. METHODOLOGY/RESULTS: We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. CONCLUSIONS: Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis. Public Library of Science 2010-09-29 /pmc/articles/PMC2947508/ /pubmed/20927376 http://dx.doi.org/10.1371/journal.pone.0013066 Text en Kupershmidt et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Kupershmidt, Ilya
Su, Qiaojuan Jane
Grewal, Anoop
Sundaresh, Suman
Halperin, Inbal
Flynn, James
Shekar, Mamatha
Wang, Helen
Park, Jenny
Cui, Wenwu
Wall, Gregory D.
Wisotzkey, Robert
Alag, Satnam
Akhtari, Saeid
Ronaghi, Mostafa
Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data
title Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data
title_full Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data
title_fullStr Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data
title_full_unstemmed Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data
title_short Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data
title_sort ontology-based meta-analysis of global collections of high-throughput public data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947508/
https://www.ncbi.nlm.nih.gov/pubmed/20927376
http://dx.doi.org/10.1371/journal.pone.0013066
work_keys_str_mv AT kupershmidtilya ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT suqiaojuanjane ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT grewalanoop ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT sundareshsuman ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT halperininbal ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT flynnjames ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT shekarmamatha ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT wanghelen ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT parkjenny ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT cuiwenwu ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT wallgregoryd ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT wisotzkeyrobert ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT alagsatnam ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT akhtarisaeid ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata
AT ronaghimostafa ontologybasedmetaanalysisofglobalcollectionsofhighthroughputpublicdata