Cargando…

Anchor: trans-cell type prediction of transcription factor binding sites

The ENCyclopedia of DNA Elements (ENCODE) consortium has generated transcription factor (TF) binding ChIP-seq data covering hundreds of TF proteins and cell types; however, due to limits on time and resources, only a small fraction of all possible TF-cell type pairs have been profiled. One solution...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Hongyang, Quang, Daniel, Guan, Yuanfang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory Press 2019
Materias:	Method
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6360811/ https://www.ncbi.nlm.nih.gov/pubmed/30567711 http://dx.doi.org/10.1101/gr.237156.118

_version_	1783392584397225984
author	Li, Hongyang Quang, Daniel Guan, Yuanfang
author_facet	Li, Hongyang Quang, Daniel Guan, Yuanfang
author_sort	Li, Hongyang
collection	PubMed
description	The ENCyclopedia of DNA Elements (ENCODE) consortium has generated transcription factor (TF) binding ChIP-seq data covering hundreds of TF proteins and cell types; however, due to limits on time and resources, only a small fraction of all possible TF-cell type pairs have been profiled. One solution is to build machine learning models trained on currently available epigenomic data sets that can be applied to the remaining missing pairs. A major challenge is that TF binding sites are cell-type–specific, which can be attributed to cellular contexts such as chromatin accessibility. Meanwhile, indirect TF-DNA binding and interactions between TFs complicate this regulatory process. Technical issues such as sequencing biases and batch effects render the prediction task even more challenging. Many pioneering efforts have been made to predict TF binding profiles based on DNA sequence and DNase-seq footprints, but to what extent a model can be generalized to completely untested cell conditions remains unknown. In this study, we describe our first place solution to the 2017 ENCODE-DREAM in vivo TF binding site prediction challenge. By carefully addressing multisource biases and information imbalance across cell types, we created a pipeline that significantly outperforms the current state-of-the-art methods. The proposed method is sufficiently complex enough to model nonlinear interactions between TF binding motifs and chromatin accessibility information up to 1500 bp from the genomic region of interest.
format	Online Article Text
id	pubmed-6360811
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Cold Spring Harbor Laboratory Press
record_format	MEDLINE/PubMed
spelling	pubmed-63608112019-08-01 Anchor: trans-cell type prediction of transcription factor binding sites Li, Hongyang Quang, Daniel Guan, Yuanfang Genome Res Method The ENCyclopedia of DNA Elements (ENCODE) consortium has generated transcription factor (TF) binding ChIP-seq data covering hundreds of TF proteins and cell types; however, due to limits on time and resources, only a small fraction of all possible TF-cell type pairs have been profiled. One solution is to build machine learning models trained on currently available epigenomic data sets that can be applied to the remaining missing pairs. A major challenge is that TF binding sites are cell-type–specific, which can be attributed to cellular contexts such as chromatin accessibility. Meanwhile, indirect TF-DNA binding and interactions between TFs complicate this regulatory process. Technical issues such as sequencing biases and batch effects render the prediction task even more challenging. Many pioneering efforts have been made to predict TF binding profiles based on DNA sequence and DNase-seq footprints, but to what extent a model can be generalized to completely untested cell conditions remains unknown. In this study, we describe our first place solution to the 2017 ENCODE-DREAM in vivo TF binding site prediction challenge. By carefully addressing multisource biases and information imbalance across cell types, we created a pipeline that significantly outperforms the current state-of-the-art methods. The proposed method is sufficiently complex enough to model nonlinear interactions between TF binding motifs and chromatin accessibility information up to 1500 bp from the genomic region of interest. Cold Spring Harbor Laboratory Press 2019-02 /pmc/articles/PMC6360811/ /pubmed/30567711 http://dx.doi.org/10.1101/gr.237156.118 Text en © 2019 Li et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle	Method Li, Hongyang Quang, Daniel Guan, Yuanfang Anchor: trans-cell type prediction of transcription factor binding sites
title	Anchor: trans-cell type prediction of transcription factor binding sites
title_full	Anchor: trans-cell type prediction of transcription factor binding sites
title_fullStr	Anchor: trans-cell type prediction of transcription factor binding sites
title_full_unstemmed	Anchor: trans-cell type prediction of transcription factor binding sites
title_short	Anchor: trans-cell type prediction of transcription factor binding sites
title_sort	anchor: trans-cell type prediction of transcription factor binding sites
topic	Method
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6360811/ https://www.ncbi.nlm.nih.gov/pubmed/30567711 http://dx.doi.org/10.1101/gr.237156.118
work_keys_str_mv	AT lihongyang anchortranscelltypepredictionoftranscriptionfactorbindingsites AT quangdaniel anchortranscelltypepredictionoftranscriptionfactorbindingsites AT guanyuanfang anchortranscelltypepredictionoftranscriptionfactorbindingsites

Anchor: trans-cell type prediction of transcription factor binding sites

Ejemplares similares