Cargando…

msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding

Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic...

Descripción completa

Detalles Bibliográficos
Autores principales: Raj, Anil, Shim, Heejung, Gilad, Yoav, Pritchard, Jonathan K., Stephens, Matthew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4583425/
https://www.ncbi.nlm.nih.gov/pubmed/26406244
http://dx.doi.org/10.1371/journal.pone.0138030
_version_ 1782391845756600320
author Raj, Anil
Shim, Heejung
Gilad, Yoav
Pritchard, Jonathan K.
Stephens, Matthew
author_facet Raj, Anil
Shim, Heejung
Gilad, Yoav
Pritchard, Jonathan K.
Stephens, Matthew
author_sort Raj, Anil
collection PubMed
description Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.
format Online
Article
Text
id pubmed-4583425
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-45834252015-10-02 msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding Raj, Anil Shim, Heejung Gilad, Yoav Pritchard, Jonathan K. Stephens, Matthew PLoS One Research Article Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede. Public Library of Science 2015-09-25 /pmc/articles/PMC4583425/ /pubmed/26406244 http://dx.doi.org/10.1371/journal.pone.0138030 Text en © 2015 Raj et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Raj, Anil
Shim, Heejung
Gilad, Yoav
Pritchard, Jonathan K.
Stephens, Matthew
msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding
title msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding
title_full msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding
title_fullStr msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding
title_full_unstemmed msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding
title_short msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding
title_sort mscentipede: modeling heterogeneity across genomic sites and replicates improves accuracy in the inference of transcription factor binding
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4583425/
https://www.ncbi.nlm.nih.gov/pubmed/26406244
http://dx.doi.org/10.1371/journal.pone.0138030
work_keys_str_mv AT rajanil mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding
AT shimheejung mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding
AT giladyoav mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding
AT pritchardjonathank mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding
AT stephensmatthew mscentipedemodelingheterogeneityacrossgenomicsitesandreplicatesimprovesaccuracyintheinferenceoftranscriptionfactorbinding