Cargando…

Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis

MOTIVATION: Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs o...

Descripción completa

Detalles Bibliográficos
Autores principales: Lederer, Simone, Heskes, Tom, van Heeringen, Simon J., Albers, Cornelis A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7194367/
https://www.ncbi.nlm.nih.gov/pubmed/32357166
http://dx.doi.org/10.1371/journal.pone.0231824
_version_ 1783528330900799488
author Lederer, Simone
Heskes, Tom
van Heeringen, Simon J.
Albers, Cornelis A.
author_facet Lederer, Simone
Heskes, Tom
van Heeringen, Simon J.
Albers, Cornelis A.
author_sort Lederer, Simone
collection PubMed
description MOTIVATION: Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression. In such models of TF motif activity the data is usually modeled assuming a linear relationship between the motif activity and the gene expression level. A commonly used method to model motif influence is based on Ridge Regression. One important assumption of linear regression is the independence between samples. However, if samples are generated from the same cell line, tissue, or other biological source, this assumption may be invalid. This same assumption of independence is also applied to different yet similar experimental conditions, which may also be inappropriate. In theory, the independence assumption between samples could lead to loss in signal detection. Here we investigate whether a Bayesian model that allows for correlations results in more accurate inference of motif activities. RESULTS: We extend the Ridge Regression to a Bayesian Linear Mixed Model, which allows us to model dependence between different samples. In a simulation study, we investigate the differences between the two model assumptions. We show that our Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, which is the signal that can not be explained by TF motifs, is uncorrelated. However, we demonstrate that there is no such gain in performance if the noise has a similar covariance structure over samples as the signal that can be explained by motifs. We give a mathematical explanation to why this is the case. Using four representative real datasets we show that at most ∼​40% of the signal is explained by motifs using the linear model. With these data there is no advantage to using the Bayesian Linear Mixed Model, due to the similarity of the covariance structure. AVAILABILITY & IMPLEMENTATION: The project implementation is available at https://github.com/Sim19/SimGEXPwMotifs.
format Online
Article
Text
id pubmed-7194367
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71943672020-05-11 Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis Lederer, Simone Heskes, Tom van Heeringen, Simon J. Albers, Cornelis A. PLoS One Research Article MOTIVATION: Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression. In such models of TF motif activity the data is usually modeled assuming a linear relationship between the motif activity and the gene expression level. A commonly used method to model motif influence is based on Ridge Regression. One important assumption of linear regression is the independence between samples. However, if samples are generated from the same cell line, tissue, or other biological source, this assumption may be invalid. This same assumption of independence is also applied to different yet similar experimental conditions, which may also be inappropriate. In theory, the independence assumption between samples could lead to loss in signal detection. Here we investigate whether a Bayesian model that allows for correlations results in more accurate inference of motif activities. RESULTS: We extend the Ridge Regression to a Bayesian Linear Mixed Model, which allows us to model dependence between different samples. In a simulation study, we investigate the differences between the two model assumptions. We show that our Bayesian Linear Mixed Model implementation outperforms Ridge Regression in a simulation scenario where the noise, which is the signal that can not be explained by TF motifs, is uncorrelated. However, we demonstrate that there is no such gain in performance if the noise has a similar covariance structure over samples as the signal that can be explained by motifs. We give a mathematical explanation to why this is the case. Using four representative real datasets we show that at most ∼​40% of the signal is explained by motifs using the linear model. With these data there is no advantage to using the Bayesian Linear Mixed Model, due to the similarity of the covariance structure. AVAILABILITY & IMPLEMENTATION: The project implementation is available at https://github.com/Sim19/SimGEXPwMotifs. Public Library of Science 2020-05-01 /pmc/articles/PMC7194367/ /pubmed/32357166 http://dx.doi.org/10.1371/journal.pone.0231824 Text en © 2020 Lederer et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lederer, Simone
Heskes, Tom
van Heeringen, Simon J.
Albers, Cornelis A.
Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis
title Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis
title_full Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis
title_fullStr Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis
title_full_unstemmed Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis
title_short Investigating the effect of dependence between conditions with Bayesian Linear Mixed Models for motif activity analysis
title_sort investigating the effect of dependence between conditions with bayesian linear mixed models for motif activity analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7194367/
https://www.ncbi.nlm.nih.gov/pubmed/32357166
http://dx.doi.org/10.1371/journal.pone.0231824
work_keys_str_mv AT lederersimone investigatingtheeffectofdependencebetweenconditionswithbayesianlinearmixedmodelsformotifactivityanalysis
AT heskestom investigatingtheeffectofdependencebetweenconditionswithbayesianlinearmixedmodelsformotifactivityanalysis
AT vanheeringensimonj investigatingtheeffectofdependencebetweenconditionswithbayesianlinearmixedmodelsformotifactivityanalysis
AT alberscornelisa investigatingtheeffectofdependencebetweenconditionswithbayesianlinearmixedmodelsformotifactivityanalysis