Cargando…

Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data

Motivation: Computational prediction of transcription factor (TF) binding sites in the genome remains a challenging task. Here, we present Romulus, a novel computational method for identifying individual TF binding sites from genome sequence information and cell-type–specific experimental data, such...

Descripción completa

Detalles Bibliográficos
Autores principales: Jankowski, Aleksander, Tiuryn, Jerzy, Prabhakar, Shyam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978937/
https://www.ncbi.nlm.nih.gov/pubmed/27153645
http://dx.doi.org/10.1093/bioinformatics/btw209
_version_ 1782447244060917760
author Jankowski, Aleksander
Tiuryn, Jerzy
Prabhakar, Shyam
author_facet Jankowski, Aleksander
Tiuryn, Jerzy
Prabhakar, Shyam
author_sort Jankowski, Aleksander
collection PubMed
description Motivation: Computational prediction of transcription factor (TF) binding sites in the genome remains a challenging task. Here, we present Romulus, a novel computational method for identifying individual TF binding sites from genome sequence information and cell-type–specific experimental data, such as DNase-seq. It combines the strengths of previous approaches, and improves robustness by reducing the number of free parameters in the model by an order of magnitude. Results: We show that Romulus significantly outperforms existing methods across three sources of DNase-seq data, by assessing the performance of these tools against ChIP-seq profiles. The difference was particularly significant when applied to binding site prediction for low-information-content motifs. Our method is capable of inferring multiple binding modes for a single TF, which differ in their DNase I cut profile. Finally, using the model learned by Romulus and ChIP-seq data, we introduce Binding in Closed Chromatin (BCC) as a quantitative measure of TF pioneer factor activity. Uniquely, our measure quantifies a defining feature of pioneer factors, namely their ability to bind closed chromatin. Availability and Implementation: Romulus is freely available as an R package at http://github.com/ajank/Romulus. Contact: ajank@mimuw.edu.pl Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4978937
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49789372016-08-11 Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data Jankowski, Aleksander Tiuryn, Jerzy Prabhakar, Shyam Bioinformatics Originals Papers Motivation: Computational prediction of transcription factor (TF) binding sites in the genome remains a challenging task. Here, we present Romulus, a novel computational method for identifying individual TF binding sites from genome sequence information and cell-type–specific experimental data, such as DNase-seq. It combines the strengths of previous approaches, and improves robustness by reducing the number of free parameters in the model by an order of magnitude. Results: We show that Romulus significantly outperforms existing methods across three sources of DNase-seq data, by assessing the performance of these tools against ChIP-seq profiles. The difference was particularly significant when applied to binding site prediction for low-information-content motifs. Our method is capable of inferring multiple binding modes for a single TF, which differ in their DNase I cut profile. Finally, using the model learned by Romulus and ChIP-seq data, we introduce Binding in Closed Chromatin (BCC) as a quantitative measure of TF pioneer factor activity. Uniquely, our measure quantifies a defining feature of pioneer factors, namely their ability to bind closed chromatin. Availability and Implementation: Romulus is freely available as an R package at http://github.com/ajank/Romulus. Contact: ajank@mimuw.edu.pl Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-08-15 2016-04-19 /pmc/articles/PMC4978937/ /pubmed/27153645 http://dx.doi.org/10.1093/bioinformatics/btw209 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Originals Papers
Jankowski, Aleksander
Tiuryn, Jerzy
Prabhakar, Shyam
Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data
title Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data
title_full Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data
title_fullStr Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data
title_full_unstemmed Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data
title_short Romulus: robust multi-state identification of transcription factor binding sites from DNase-seq data
title_sort romulus: robust multi-state identification of transcription factor binding sites from dnase-seq data
topic Originals Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978937/
https://www.ncbi.nlm.nih.gov/pubmed/27153645
http://dx.doi.org/10.1093/bioinformatics/btw209
work_keys_str_mv AT jankowskialeksander romulusrobustmultistateidentificationoftranscriptionfactorbindingsitesfromdnaseseqdata
AT tiurynjerzy romulusrobustmultistateidentificationoftranscriptionfactorbindingsitesfromdnaseseqdata
AT prabhakarshyam romulusrobustmultistateidentificationoftranscriptionfactorbindingsitesfromdnaseseqdata