Cargando…

MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis

BACKGROUND: Transcription factors (TFs) are proteins that bind to DNA and regulate gene expression. To understand details of gene regulation, characterizing TF binding sites in different cell types, diseases and among individuals is essential. However, sometimes TF binding can only be measured from...

Descripción completa

Detalles Bibliográficos
Autores principales: Rautio, Sini, Lähdesmäki, Harri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4690251/
https://www.ncbi.nlm.nih.gov/pubmed/26703974
http://dx.doi.org/10.1186/s12859-015-0834-3
_version_ 1782406978376564736
author Rautio, Sini
Lähdesmäki, Harri
author_facet Rautio, Sini
Lähdesmäki, Harri
author_sort Rautio, Sini
collection PubMed
description BACKGROUND: Transcription factors (TFs) are proteins that bind to DNA and regulate gene expression. To understand details of gene regulation, characterizing TF binding sites in different cell types, diseases and among individuals is essential. However, sometimes TF binding can only be measured from biological samples that contain multiple cell or tissue types. Sample heterogeneity can have a considerable effect on TF binding site detection. While manual separation techniques can be used to isolate a cell type of interest from heterogeneous samples, such techniques are challenging and can change intra-cellular interactions, including protein-DNA binding. Computational deconvolution methods have emerged as an alternative strategy to study heterogeneous samples and numerous methods have been proposed to analyze gene expression. However, no computational method exists to deconvolve cell type specific TF binding from heterogeneous samples. RESULTS: We present a probabilistic method, MixChIP, to identify cell type specific TF binding sites from heterogeneous chromatin immunoprecipitation sequencing (ChIP-seq) data. Our method simultaneously estimates the binding strength in different cell types as well as the proportions of different cell types in each sample when only partial prior information about cell type composition is available. We demonstrate the utility of MixChIP by analyzing ChIP-seq data from two cell lines which we artificially mix to generate (simulated) heterogeneous samples and by analyzing ChIP-seq data from breast cancer patients measuring oestrogen receptor (ER) binding in primary breast cancer tissues. We show that MixChIP is more accurate in detecting TF binding sites from multiple heterogeneous ChIP-seq samples than the standard methods which do not account for sample heterogeneity. CONCLUSIONS: Our results show that MixChIP can estimate cell-type proportions and identify cell type specific TF binding sites from heterogeneous ChIP-seq samples. Thus, MixChIP can be an invaluable tool in analyzing heterogeneous ChIP-seq samples, such as those originating from cancer studies. R implementation is available at http://research.ics.aalto.fi/csb/software/mixchip/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0834-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4690251
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46902512015-12-25 MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis Rautio, Sini Lähdesmäki, Harri BMC Bioinformatics Research Article BACKGROUND: Transcription factors (TFs) are proteins that bind to DNA and regulate gene expression. To understand details of gene regulation, characterizing TF binding sites in different cell types, diseases and among individuals is essential. However, sometimes TF binding can only be measured from biological samples that contain multiple cell or tissue types. Sample heterogeneity can have a considerable effect on TF binding site detection. While manual separation techniques can be used to isolate a cell type of interest from heterogeneous samples, such techniques are challenging and can change intra-cellular interactions, including protein-DNA binding. Computational deconvolution methods have emerged as an alternative strategy to study heterogeneous samples and numerous methods have been proposed to analyze gene expression. However, no computational method exists to deconvolve cell type specific TF binding from heterogeneous samples. RESULTS: We present a probabilistic method, MixChIP, to identify cell type specific TF binding sites from heterogeneous chromatin immunoprecipitation sequencing (ChIP-seq) data. Our method simultaneously estimates the binding strength in different cell types as well as the proportions of different cell types in each sample when only partial prior information about cell type composition is available. We demonstrate the utility of MixChIP by analyzing ChIP-seq data from two cell lines which we artificially mix to generate (simulated) heterogeneous samples and by analyzing ChIP-seq data from breast cancer patients measuring oestrogen receptor (ER) binding in primary breast cancer tissues. We show that MixChIP is more accurate in detecting TF binding sites from multiple heterogeneous ChIP-seq samples than the standard methods which do not account for sample heterogeneity. CONCLUSIONS: Our results show that MixChIP can estimate cell-type proportions and identify cell type specific TF binding sites from heterogeneous ChIP-seq samples. Thus, MixChIP can be an invaluable tool in analyzing heterogeneous ChIP-seq samples, such as those originating from cancer studies. R implementation is available at http://research.ics.aalto.fi/csb/software/mixchip/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0834-3) contains supplementary material, which is available to authorized users. BioMed Central 2015-12-24 /pmc/articles/PMC4690251/ /pubmed/26703974 http://dx.doi.org/10.1186/s12859-015-0834-3 Text en © Rautio and Lähdesmäki. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Rautio, Sini
Lähdesmäki, Harri
MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis
title MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis
title_full MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis
title_fullStr MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis
title_full_unstemmed MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis
title_short MixChIP: a probabilistic method for cell type specific protein-DNA binding analysis
title_sort mixchip: a probabilistic method for cell type specific protein-dna binding analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4690251/
https://www.ncbi.nlm.nih.gov/pubmed/26703974
http://dx.doi.org/10.1186/s12859-015-0834-3
work_keys_str_mv AT rautiosini mixchipaprobabilisticmethodforcelltypespecificproteindnabindinganalysis
AT lahdesmakiharri mixchipaprobabilisticmethodforcelltypespecificproteindnabindinganalysis