Cargando…
Global Meta-Analysis of Transcriptomics Studies
Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs....
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3935861/ https://www.ncbi.nlm.nih.gov/pubmed/24586684 http://dx.doi.org/10.1371/journal.pone.0089318 |
_version_ | 1782305232380755968 |
---|---|
author | Caldas, José Vinga, Susana |
author_facet | Caldas, José Vinga, Susana |
author_sort | Caldas, José |
collection | PubMed |
description | Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons. |
format | Online Article Text |
id | pubmed-3935861 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-39358612014-03-04 Global Meta-Analysis of Transcriptomics Studies Caldas, José Vinga, Susana PLoS One Research Article Transcriptomics meta-analysis aims at re-using existing data to derive novel biological hypotheses, and is motivated by the public availability of a large number of independent studies. Current methods are based on breaking down studies into multiple comparisons between phenotypes (e.g. disease vs. healthy), based on the studies' experimental designs, followed by computing the overlap between the resulting differential expression signatures. While useful, in this methodology each study yields multiple independent phenotype comparisons, and connections are established not between studies, but rather between subsets of the studies corresponding to phenotype comparisons. We propose a rank-based statistical meta-analysis framework that establishes global connections between transcriptomics studies without breaking down studies into sets of phenotype comparisons. By using a rank product method, our framework extracts global features from each study, corresponding to genes that are consistently among the most expressed or differentially expressed genes in that study. Those features are then statistically modelled via a term-frequency inverse-document frequency (TF-IDF) model, which is then used for connecting studies. Our framework is fast and parameter-free; when applied to large collections of Homo sapiens and Streptococcus pneumoniae transcriptomics studies, it performs better than similarity-based approaches in retrieving related studies, using a Medical Subject Headings gold standard. Finally, we highlight via case studies how the framework can be used to derive novel biological hypotheses regarding related studies and the genes that drive those connections. Our proposed statistical framework shows that it is possible to perform a meta-analysis of transcriptomics studies with arbitrary experimental designs by deriving global expression features rather than decomposing studies into multiple phenotype comparisons. Public Library of Science 2014-02-26 /pmc/articles/PMC3935861/ /pubmed/24586684 http://dx.doi.org/10.1371/journal.pone.0089318 Text en © 2014 Caldas, Vinga http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Caldas, José Vinga, Susana Global Meta-Analysis of Transcriptomics Studies |
title | Global Meta-Analysis of Transcriptomics Studies |
title_full | Global Meta-Analysis of Transcriptomics Studies |
title_fullStr | Global Meta-Analysis of Transcriptomics Studies |
title_full_unstemmed | Global Meta-Analysis of Transcriptomics Studies |
title_short | Global Meta-Analysis of Transcriptomics Studies |
title_sort | global meta-analysis of transcriptomics studies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3935861/ https://www.ncbi.nlm.nih.gov/pubmed/24586684 http://dx.doi.org/10.1371/journal.pone.0089318 |
work_keys_str_mv | AT caldasjose globalmetaanalysisoftranscriptomicsstudies AT vingasusana globalmetaanalysisoftranscriptomicsstudies |