Cargando…
Computational prediction and characterization of cell-type-specific and shared binding sites
MOTIVATION: Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF’s intrinsic sequence...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825777/ https://www.ncbi.nlm.nih.gov/pubmed/36484687 http://dx.doi.org/10.1093/bioinformatics/btac798 |
_version_ | 1784866696414101504 |
---|---|
author | Zhang, Qinhu Teng, Pengrui Wang, Siguo He, Ying Cui, Zhen Guo, Zhenghao Liu, Yixin Yuan, Changan Liu, Qi Huang, De-Shuang |
author_facet | Zhang, Qinhu Teng, Pengrui Wang, Siguo He, Ying Cui, Zhen Guo, Zhenghao Liu, Yixin Yuan, Changan Liu, Qi Huang, De-Shuang |
author_sort | Zhang, Qinhu |
collection | PubMed |
description | MOTIVATION: Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF’s intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied. RESULTS: In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment. AVAILABILITY AND IMPLEMENTATION: The source code is available at: https://github.com/turningpoint1988/CSSBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9825777 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98257772023-01-10 Computational prediction and characterization of cell-type-specific and shared binding sites Zhang, Qinhu Teng, Pengrui Wang, Siguo He, Ying Cui, Zhen Guo, Zhenghao Liu, Yixin Yuan, Changan Liu, Qi Huang, De-Shuang Bioinformatics Original Paper MOTIVATION: Cell-type-specific gene expression is maintained in large part by transcription factors (TFs) selectively binding to distinct sets of sites in different cell types. Recent research works have provided evidence that such cell-type-specific binding is determined by TF’s intrinsic sequence preferences, cooperative interactions with co-factors, cell-type-specific chromatin landscapes and 3D chromatin interactions. However, computational prediction and characterization of cell-type-specific and shared binding sites is rarely studied. RESULTS: In this article, we propose two computational approaches for predicting and characterizing cell-type-specific and shared binding sites by integrating multiple types of features, in which one is based on XGBoost and another is based on convolutional neural network (CNN). To validate the performance of our proposed approaches, ChIP-seq datasets of 10 binding factors were collected from the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines, each of which was further categorized into cell-type-specific (GM12878- and K562-specific) and shared binding sites. Then, multiple types of features for these binding sites were integrated to train the XGBoost- and CNN-based models. Experimental results show that our proposed approaches significantly outperform other competing methods on three classification tasks. Moreover, we identified independent feature contributions for cell-type-specific and shared sites through SHAP values and explored the ability of the CNN-based model to predict cell-type-specific and shared binding sites by excluding or including DNase signals. Furthermore, we investigated the generalization ability of our proposed approaches to different binding factors in the same cellular environment. AVAILABILITY AND IMPLEMENTATION: The source code is available at: https://github.com/turningpoint1988/CSSBS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-12-09 /pmc/articles/PMC9825777/ /pubmed/36484687 http://dx.doi.org/10.1093/bioinformatics/btac798 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Zhang, Qinhu Teng, Pengrui Wang, Siguo He, Ying Cui, Zhen Guo, Zhenghao Liu, Yixin Yuan, Changan Liu, Qi Huang, De-Shuang Computational prediction and characterization of cell-type-specific and shared binding sites |
title | Computational prediction and characterization of cell-type-specific and shared binding sites |
title_full | Computational prediction and characterization of cell-type-specific and shared binding sites |
title_fullStr | Computational prediction and characterization of cell-type-specific and shared binding sites |
title_full_unstemmed | Computational prediction and characterization of cell-type-specific and shared binding sites |
title_short | Computational prediction and characterization of cell-type-specific and shared binding sites |
title_sort | computational prediction and characterization of cell-type-specific and shared binding sites |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825777/ https://www.ncbi.nlm.nih.gov/pubmed/36484687 http://dx.doi.org/10.1093/bioinformatics/btac798 |
work_keys_str_mv | AT zhangqinhu computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites AT tengpengrui computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites AT wangsiguo computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites AT heying computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites AT cuizhen computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites AT guozhenghao computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites AT liuyixin computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites AT yuanchangan computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites AT liuqi computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites AT huangdeshuang computationalpredictionandcharacterizationofcelltypespecificandsharedbindingsites |