Cargando…
msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding
Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4583425/ https://www.ncbi.nlm.nih.gov/pubmed/26406244 http://dx.doi.org/10.1371/journal.pone.0138030 |
_version_ | 1782391845756600320 |
---|---|
author | Raj, Anil Shim, Heejung Gilad, Yoav Pritchard, Jonathan K. Stephens, Matthew |
author_facet | Raj, Anil Shim, Heejung Gilad, Yoav Pritchard, Jonathan K. Stephens, Matthew |
author_sort | Raj, Anil |
collection | PubMed |
description | Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede. |
format | Online Article Text |
id | pubmed-4583425 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-45834252015-10-02 msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding Raj, Anil Shim, Heejung Gilad, Yoav Pritchard, Jonathan K. Stephens, Matthew PLoS One Research Article Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede. Public Library of Science 2015-09-25 /pmc/articles/PMC4583425/ /pubmed/26406244 http://dx.doi.org/10.1371/journal.pone.0138030 Text en © 2015 Raj et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Raj, Anil Shim, Heejung Gilad, Yoav Pritchard, Jonathan K. Stephens, Matthew msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding |
title | msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding |
title_full | msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding |
title_fullStr | msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding |
title_full_unstemmed | msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding |
title_short | msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding |
title_sort | mscentipede: modeling heterogeneity across genomic sites and replicates improves accuracy in the inference of transcription factor binding |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4583425/ https://www.ncbi.nlm.nih.gov/pubmed/26406244 http://dx.doi.org/10.1371/journal.pone.0138030 |
work_keys_str_mv | AT rajanil mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding AT shimheejung mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding AT giladyoav mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding AT pritchardjonathank mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding AT stephensmatthew mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding |