Cargando…

On the problem of confounders in modeling gene expression

MOTIVATION: Modeling of Transcription Factor (TF) binding from both ChIP-seq and chromatin accessibility data has become prevalent in computational biology. Several models have been proposed to generate new hypotheses on transcriptional regulation. However, there is no distinct approach to derive TF...

Descripción completa

Detalles Bibliográficos
Autores principales:	Schmidt, Florian, Schulz, Marcel H
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6530814/ https://www.ncbi.nlm.nih.gov/pubmed/30084962 http://dx.doi.org/10.1093/bioinformatics/bty674

_version_	1783420701496049664
author	Schmidt, Florian Schulz, Marcel H
author_facet	Schmidt, Florian Schulz, Marcel H
author_sort	Schmidt, Florian
collection	PubMed
description	MOTIVATION: Modeling of Transcription Factor (TF) binding from both ChIP-seq and chromatin accessibility data has become prevalent in computational biology. Several models have been proposed to generate new hypotheses on transcriptional regulation. However, there is no distinct approach to derive TF binding scores from ChIP-seq and open chromatin experiments. Here, we review biases of various scoring approaches and their effects on the interpretation and reliability of predictive gene expression models. RESULTS: We generated predictive models for gene expression using ChIP-seq and DNase1-seq data from DEEP and ENCODE. Via randomization experiments, we identified confounders in TF gene scores derived from both ChIP-seq and DNase1-seq data. We reviewed correction approaches for both data types, which reduced the influence of identified confounders without harm to model performance. Also, our analyses highlighted further quality control measures, in addition to model performance, that may help to assure model reliability and to avoid misinterpretation in future studies. AVAILABILITY AND IMPLEMENTATION: The software used in this study is available online at https://github.com/SchulzLab/TEPIC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6530814
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-65308142019-05-28 On the problem of confounders in modeling gene expression Schmidt, Florian Schulz, Marcel H Bioinformatics Review MOTIVATION: Modeling of Transcription Factor (TF) binding from both ChIP-seq and chromatin accessibility data has become prevalent in computational biology. Several models have been proposed to generate new hypotheses on transcriptional regulation. However, there is no distinct approach to derive TF binding scores from ChIP-seq and open chromatin experiments. Here, we review biases of various scoring approaches and their effects on the interpretation and reliability of predictive gene expression models. RESULTS: We generated predictive models for gene expression using ChIP-seq and DNase1-seq data from DEEP and ENCODE. Via randomization experiments, we identified confounders in TF gene scores derived from both ChIP-seq and DNase1-seq data. We reviewed correction approaches for both data types, which reduced the influence of identified confounders without harm to model performance. Also, our analyses highlighted further quality control measures, in addition to model performance, that may help to assure model reliability and to avoid misinterpretation in future studies. AVAILABILITY AND IMPLEMENTATION: The software used in this study is available online at https://github.com/SchulzLab/TEPIC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-02-15 2018-08-02 /pmc/articles/PMC6530814/ /pubmed/30084962 http://dx.doi.org/10.1093/bioinformatics/bty674 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Review Schmidt, Florian Schulz, Marcel H On the problem of confounders in modeling gene expression
title	On the problem of confounders in modeling gene expression
title_full	On the problem of confounders in modeling gene expression
title_fullStr	On the problem of confounders in modeling gene expression
title_full_unstemmed	On the problem of confounders in modeling gene expression
title_short	On the problem of confounders in modeling gene expression
title_sort	on the problem of confounders in modeling gene expression
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6530814/ https://www.ncbi.nlm.nih.gov/pubmed/30084962 http://dx.doi.org/10.1093/bioinformatics/bty674
work_keys_str_mv	AT schmidtflorian ontheproblemofconfoundersinmodelinggeneexpression AT schulzmarcelh ontheproblemofconfoundersinmodelinggeneexpression

On the problem of confounders in modeling gene expression

Ejemplares similares