Cargando…

Computational prediction and characterization of cell-type-specific and shared binding sites

MOTIVATION: Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF’s intrinsic sequence...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qinhu, Teng, Pengrui, Wang, Siguo, He, Ying, Cui, Zhen, Guo, Zhenghao, Liu, Yixin, Yuan, Changan, Liu, Qi, Huang, De-Shuang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825777/
https://www.ncbi.nlm.nih.gov/pubmed/36484687
http://dx.doi.org/10.1093/bioinformatics/btac798
_version_ 1784866696414101504
author Zhang, Qinhu
Teng, Pengrui
Wang, Siguo
He, Ying
Cui, Zhen
Guo, Zhenghao
Liu, Yixin
Yuan, Changan
Liu, Qi
Huang, De-Shuang
author_facet Zhang, Qinhu
Teng, Pengrui
Wang, Siguo
He, Ying
Cui, Zhen
Guo, Zhenghao
Liu, Yixin
Yuan, Changan
Liu, Qi
Huang, De-Shuang
author_sort Zhang, Qinhu
collection PubMed
description MOTIVATION: Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF’s intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied. RESULTS: In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment. AVAILABILITY AND IMPLEMENTATION: The source code is available at: https://github.com/turningpoint1988/CSSBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9825777
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98257772023-01-10 Computational prediction and characterization of cell-type-specific and shared binding sites Zhang, Qinhu Teng, Pengrui Wang, Siguo He, Ying Cui, Zhen Guo, Zhenghao Liu, Yixin Yuan, Changan Liu, Qi Huang, De-Shuang Bioinformatics Original Paper MOTIVATION: Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF’s intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied. RESULTS: In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment. AVAILABILITY AND IMPLEMENTATION: The source code is available at: https://github.com/turningpoint1988/CSSBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-12-09 /pmc/articles/PMC9825777/ /pubmed/36484687 http://dx.doi.org/10.1093/bioinformatics/btac798 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Zhang, Qinhu
Teng, Pengrui
Wang, Siguo
He, Ying
Cui, Zhen
Guo, Zhenghao
Liu, Yixin
Yuan, Changan
Liu, Qi
Huang, De-Shuang
Computational prediction and characterization of cell-type-specific and shared binding sites
title Computational prediction and characterization of cell-type-specific and shared binding sites
title_full Computational prediction and characterization of cell-type-specific and shared binding sites
title_fullStr Computational prediction and characterization of cell-type-specific and shared binding sites
title_full_unstemmed Computational prediction and characterization of cell-type-specific and shared binding sites
title_short Computational prediction and characterization of cell-type-specific and shared binding sites
title_sort computational prediction and characterization of cell-type-specific and shared binding sites
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825777/
https://www.ncbi.nlm.nih.gov/pubmed/36484687
http://dx.doi.org/10.1093/bioinformatics/btac798
work_keys_str_mv AT zhangqinhu computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites
AT tengpengrui computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites
AT wangsiguo computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites
AT heying computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites
AT cuizhen computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites
AT guozhenghao computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites
AT liuyixin computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites
AT yuanchangan computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites
AT liuqi computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites
AT huangdeshuang computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites