Cargando…

SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps

Genome-wide maps of transcription factor (TF) occupancy and regions of open chromatin implicitly contain DNA sequence signals for multiple factors. We present SeqGL, a novel de novo motif discovery algorithm to identify multiple TF sequence signals from ChIP-, DNase-, and ATAC-seq profiles. SeqGL tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Setty, Manu, Leslie, Christina S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446265/
https://www.ncbi.nlm.nih.gov/pubmed/26016777
http://dx.doi.org/10.1371/journal.pcbi.1004271
_version_ 1782373392146497536
author Setty, Manu
Leslie, Christina S.
author_facet Setty, Manu
Leslie, Christina S.
author_sort Setty, Manu
collection PubMed
description Genome-wide maps of transcription factor (TF) occupancy and regions of open chromatin implicitly contain DNA sequence signals for multiple factors. We present SeqGL, a novel de novo motif discovery algorithm to identify multiple TF sequence signals from ChIP-, DNase-, and ATAC-seq profiles. SeqGL trains a discriminative model using a k-mer feature representation together with group lasso regularization to extract a collection of sequence signals that distinguish peak sequences from flanking regions. Benchmarked on over 100 ChIP-seq experiments, SeqGL outperformed traditional motif discovery tools in discriminative accuracy. Furthermore, SeqGL can be naturally used with multitask learning to identify genomic and cell-type context determinants of TF binding. SeqGL successfully scales to the large multiplicity of sequence signals in DNase- or ATAC-seq maps. In particular, SeqGL was able to identify a number of ChIP-seq validated sequence signals that were not found by traditional motif discovery algorithms. Thus compared to widely used motif discovery algorithms, SeqGL demonstrates both greater discriminative accuracy and higher sensitivity for detecting the DNA sequence signals underlying regulatory element maps. SeqGL is available at http://cbio.mskcc.org/public/Leslie/SeqGL/.
format Online
Article
Text
id pubmed-4446265
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44462652015-06-09 SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps Setty, Manu Leslie, Christina S. PLoS Comput Biol Research Article Genome-wide maps of transcription factor (TF) occupancy and regions of open chromatin implicitly contain DNA sequence signals for multiple factors. We present SeqGL, a novel de novo motif discovery algorithm to identify multiple TF sequence signals from ChIP-, DNase-, and ATAC-seq profiles. SeqGL trains a discriminative model using a k-mer feature representation together with group lasso regularization to extract a collection of sequence signals that distinguish peak sequences from flanking regions. Benchmarked on over 100 ChIP-seq experiments, SeqGL outperformed traditional motif discovery tools in discriminative accuracy. Furthermore, SeqGL can be naturally used with multitask learning to identify genomic and cell-type context determinants of TF binding. SeqGL successfully scales to the large multiplicity of sequence signals in DNase- or ATAC-seq maps. In particular, SeqGL was able to identify a number of ChIP-seq validated sequence signals that were not found by traditional motif discovery algorithms. Thus compared to widely used motif discovery algorithms, SeqGL demonstrates both greater discriminative accuracy and higher sensitivity for detecting the DNA sequence signals underlying regulatory element maps. SeqGL is available at http://cbio.mskcc.org/public/Leslie/SeqGL/. Public Library of Science 2015-05-27 /pmc/articles/PMC4446265/ /pubmed/26016777 http://dx.doi.org/10.1371/journal.pcbi.1004271 Text en © 2015 Setty, Leslie http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Setty, Manu
Leslie, Christina S.
SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps
title SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps
title_full SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps
title_fullStr SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps
title_full_unstemmed SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps
title_short SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps
title_sort seqgl identifies context-dependent binding signals in genome-wide regulatory element maps
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446265/
https://www.ncbi.nlm.nih.gov/pubmed/26016777
http://dx.doi.org/10.1371/journal.pcbi.1004271
work_keys_str_mv AT settymanu seqglidentifiescontextdependentbindingsignalsingenomewideregulatoryelementmaps
AT lesliechristinas seqglidentifiescontextdependentbindingsignalsingenomewideregulatoryelementmaps