Cargando…

Global Meta-Analysis of Transcriptomics Studies

Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs....

Descripción completa

Detalles Bibliográficos
Autores principales: Caldas, José, Vinga, Susana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3935861/
https://www.ncbi.nlm.nih.gov/pubmed/24586684
http://dx.doi.org/10.1371/journal.pone.0089318
_version_ 1782305232380755968
author Caldas, José
Vinga, Susana
author_facet Caldas, José
Vinga, Susana
author_sort Caldas, José
collection PubMed
description Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons.
format Online
Article
Text
id pubmed-3935861
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39358612014-03-04 Global Meta-Analysis of Transcriptomics Studies Caldas, José Vinga, Susana PLoS One Research Article Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons. Public Library of Science 2014-02-26 /pmc/articles/PMC3935861/ /pubmed/24586684 http://dx.doi.org/10.1371/journal.pone.0089318 Text en © 2014 Caldas, Vinga http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Caldas, José
Vinga, Susana
Global Meta-Analysis of Transcriptomics Studies
title Global Meta-Analysis of Transcriptomics Studies
title_full Global Meta-Analysis of Transcriptomics Studies
title_fullStr Global Meta-Analysis of Transcriptomics Studies
title_full_unstemmed Global Meta-Analysis of Transcriptomics Studies
title_short Global Meta-Analysis of Transcriptomics Studies
title_sort global meta-analysis of transcriptomics studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3935861/
https://www.ncbi.nlm.nih.gov/pubmed/24586684
http://dx.doi.org/10.1371/journal.pone.0089318
work_keys_str_mv AT caldasjose globalmetaanalysisoftranscriptomicsstudies
AT vingasusana globalmetaanalysisoftranscriptomicsstudies