Cargando…

Selection of Higher Order Regression Models in the Analysis of Multi-Factorial Transcription Data

INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linea...

Descripción completa

Detalles Bibliográficos
Autores principales: Prazeres da Costa, Olivia, Hoffman, Arthur, Rey, Johannes W., Mansmann, Ulrich, Buch, Thorsten, Tresch, Achim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3962375/
https://www.ncbi.nlm.nih.gov/pubmed/24658540
http://dx.doi.org/10.1371/journal.pone.0091840
_version_ 1782308423044431872
author Prazeres da Costa, Olivia
Hoffman, Arthur
Rey, Johannes W.
Mansmann, Ulrich
Buch, Thorsten
Tresch, Achim
author_facet Prazeres da Costa, Olivia
Hoffman, Arthur
Rey, Johannes W.
Mansmann, Ulrich
Buch, Thorsten
Tresch, Achim
author_sort Prazeres da Costa, Olivia
collection PubMed
description INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
format Online
Article
Text
id pubmed-3962375
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39623752014-03-24 Selection of Higher Order Regression Models in the Analysis of Multi-Factorial Transcription Data Prazeres da Costa, Olivia Hoffman, Arthur Rey, Johannes W. Mansmann, Ulrich Buch, Thorsten Tresch, Achim PLoS One Research Article INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data. Public Library of Science 2014-03-21 /pmc/articles/PMC3962375/ /pubmed/24658540 http://dx.doi.org/10.1371/journal.pone.0091840 Text en © 2014 Prazeres da Costa et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Prazeres da Costa, Olivia
Hoffman, Arthur
Rey, Johannes W.
Mansmann, Ulrich
Buch, Thorsten
Tresch, Achim
Selection of Higher Order Regression Models in the Analysis of Multi-Factorial Transcription Data
title Selection of Higher Order Regression Models in the Analysis of Multi-Factorial Transcription Data
title_full Selection of Higher Order Regression Models in the Analysis of Multi-Factorial Transcription Data
title_fullStr Selection of Higher Order Regression Models in the Analysis of Multi-Factorial Transcription Data
title_full_unstemmed Selection of Higher Order Regression Models in the Analysis of Multi-Factorial Transcription Data
title_short Selection of Higher Order Regression Models in the Analysis of Multi-Factorial Transcription Data
title_sort selection of higher order regression models in the analysis of multi-factorial transcription data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3962375/
https://www.ncbi.nlm.nih.gov/pubmed/24658540
http://dx.doi.org/10.1371/journal.pone.0091840
work_keys_str_mv AT prazeresdacostaolivia selectionofhigherorderregressionmodelsintheanalysisofmultifactorialtranscriptiondata
AT hoffmanarthur selectionofhigherorderregressionmodelsintheanalysisofmultifactorialtranscriptiondata
AT reyjohannesw selectionofhigherorderregressionmodelsintheanalysisofmultifactorialtranscriptiondata
AT mansmannulrich selectionofhigherorderregressionmodelsintheanalysisofmultifactorialtranscriptiondata
AT buchthorsten selectionofhigherorderregressionmodelsintheanalysisofmultifactorialtranscriptiondata
AT treschachim selectionofhigherorderregressionmodelsintheanalysisofmultifactorialtranscriptiondata