Cargando…
Strategies for aggregating gene expression data: The collapseRows R function
BACKGROUND: Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166942/ https://www.ncbi.nlm.nih.gov/pubmed/21816037 http://dx.doi.org/10.1186/1471-2105-12-322 |
_version_ | 1782211211071324160 |
---|---|
author | Miller, Jeremy A Cai, Chaochao Langfelder, Peter Geschwind, Daniel H Kurian, Sunil M Salomon, Daniel R Horvath, Steve |
author_facet | Miller, Jeremy A Cai, Chaochao Langfelder, Peter Geschwind, Daniel H Kurian, Sunil M Salomon, Daniel R Horvath, Steve |
author_sort | Miller, Jeremy A |
collection | PubMed |
description | BACKGROUND: Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied. RESULTS: We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways. CONCLUSIONS: The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools. |
format | Online Article Text |
id | pubmed-3166942 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31669422011-09-06 Strategies for aggregating gene expression data: The collapseRows R function Miller, Jeremy A Cai, Chaochao Langfelder, Peter Geschwind, Daniel H Kurian, Sunil M Salomon, Daniel R Horvath, Steve BMC Bioinformatics Methodology Article BACKGROUND: Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied. RESULTS: We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways. CONCLUSIONS: The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools. BioMed Central 2011-08-04 /pmc/articles/PMC3166942/ /pubmed/21816037 http://dx.doi.org/10.1186/1471-2105-12-322 Text en Copyright ©2011 Miller et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Miller, Jeremy A Cai, Chaochao Langfelder, Peter Geschwind, Daniel H Kurian, Sunil M Salomon, Daniel R Horvath, Steve Strategies for aggregating gene expression data: The collapseRows R function |
title | Strategies for aggregating gene expression data: The collapseRows R function |
title_full | Strategies for aggregating gene expression data: The collapseRows R function |
title_fullStr | Strategies for aggregating gene expression data: The collapseRows R function |
title_full_unstemmed | Strategies for aggregating gene expression data: The collapseRows R function |
title_short | Strategies for aggregating gene expression data: The collapseRows R function |
title_sort | strategies for aggregating gene expression data: the collapserows r function |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166942/ https://www.ncbi.nlm.nih.gov/pubmed/21816037 http://dx.doi.org/10.1186/1471-2105-12-322 |
work_keys_str_mv | AT millerjeremya strategiesforaggregatinggeneexpressiondatathecollapserowsrfunction AT caichaochao strategiesforaggregatinggeneexpressiondatathecollapserowsrfunction AT langfelderpeter strategiesforaggregatinggeneexpressiondatathecollapserowsrfunction AT geschwinddanielh strategiesforaggregatinggeneexpressiondatathecollapserowsrfunction AT kuriansunilm strategiesforaggregatinggeneexpressiondatathecollapserowsrfunction AT salomondanielr strategiesforaggregatinggeneexpressiondatathecollapserowsrfunction AT horvathsteve strategiesforaggregatinggeneexpressiondatathecollapserowsrfunction |