Cargando…
Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis
MOTIVATION: Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs o...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7194367/ https://www.ncbi.nlm.nih.gov/pubmed/32357166 http://dx.doi.org/10.1371/journal.pone.0231824 |
_version_ | 1783528330900799488 |
---|---|
author | Lederer, Simone Heskes, Tom van Heeringen, Simon J. Albers, Cornelis A. |
author_facet | Lederer, Simone Heskes, Tom van Heeringen, Simon J. Albers, Cornelis A. |
author_sort | Lederer, Simone |
collection | PubMed |
description | MOTIVATION: Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression. In such models of TF motif activity the data is usually modeled assuming a linear relationship between the motif activity and the gene expression level. A commonly used method to model motif influence is based on Ridge Regression. One important assumption of linear regression is the independence between samples. However, if samples are generated from the same cell line, tissue, or other biological source, this assumption may be invalid. This same assumption of independence is also applied to different yet similar experimental conditions, which may also be inappropriate. In theory, the independence assumption between samples could lead to loss in signal detection. Here we investigate whether a Bayesian model that allows for correlations results in more accurate inference of motif activities. RESULTS: We extend the Ridge Regression to a Bayesian Linear Mixed Model, which allows us to model dependence between different samples. In a simulation study, we investigate the differences between the two model assumptions. We show that our Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, which is the signal that can not be explained by TF motifs, is uncorrelated. However, we demonstrate that there is no such gain in performance if the noise has a similar covariance structure over samples as the signal that can be explained by motifs. We give a mathematical explanation to why this is the case. Using four representative real datasets we show that at most ∼​40% of the signal is explained by motifs using the linear model. With these data there is no advantage to using the Bayesian Linear Mixed Model, due to the similarity of the covariance structure. AVAILABILITY & IMPLEMENTATION: The project implementation is available at https://github.com/Sim19/SimGEXPwMotifs. |
format | Online Article Text |
id | pubmed-7194367 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-71943672020-05-11 Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis Lederer, Simone Heskes, Tom van Heeringen, Simon J. Albers, Cornelis A. PLoS One Research Article MOTIVATION: Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression. In such models of TF motif activity the data is usually modeled assuming a linear relationship between the motif activity and the gene expression level. A commonly used method to model motif influence is based on Ridge Regression. One important assumption of linear regression is the independence between samples. However, if samples are generated from the same cell line, tissue, or other biological source, this assumption may be invalid. This same assumption of independence is also applied to different yet similar experimental conditions, which may also be inappropriate. In theory, the independence assumption between samples could lead to loss in signal detection. Here we investigate whether a Bayesian model that allows for correlations results in more accurate inference of motif activities. RESULTS: We extend the Ridge Regression to a Bayesian Linear Mixed Model, which allows us to model dependence between different samples. In a simulation study, we investigate the differences between the two model assumptions. We show that our Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, which is the signal that can not be explained by TF motifs, is uncorrelated. However, we demonstrate that there is no such gain in performance if the noise has a similar covariance structure over samples as the signal that can be explained by motifs. We give a mathematical explanation to why this is the case. Using four representative real datasets we show that at most ∼​40% of the signal is explained by motifs using the linear model. With these data there is no advantage to using the Bayesian Linear Mixed Model, due to the similarity of the covariance structure. AVAILABILITY & IMPLEMENTATION: The project implementation is available at https://github.com/Sim19/SimGEXPwMotifs. Public Library of Science 2020-05-01 /pmc/articles/PMC7194367/ /pubmed/32357166 http://dx.doi.org/10.1371/journal.pone.0231824 Text en © 2020 Lederer et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Lederer, Simone Heskes, Tom van Heeringen, Simon J. Albers, Cornelis A. Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis |
title | Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis |
title_full | Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis |
title_fullStr | Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis |
title_full_unstemmed | Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis |
title_short | Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis |
title_sort | investigating the effect of dependence between conditions with bayesian linear mixed models for motif activity analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7194367/ https://www.ncbi.nlm.nih.gov/pubmed/32357166 http://dx.doi.org/10.1371/journal.pone.0231824 |
work_keys_str_mv | AT lederersimone investigatingtheeffectofdependencebetweenconditionswithbayesianlinearmixedmodelsformotifactivityanalysis AT heskestom investigatingtheeffectofdependencebetweenconditionswithbayesianlinearmixedmodelsformotifactivityanalysis AT vanheeringensimonj investigatingtheeffectofdependencebetweenconditionswithbayesianlinearmixedmodelsformotifactivityanalysis AT alberscornelisa investigatingtheeffectofdependencebetweenconditionswithbayesianlinearmixedmodelsformotifactivityanalysis |