Cargando…

Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility

BACKGROUND: Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Sheng, Zibetti, Cristina, Wan, Jun, Wang, Guohua, Blackshaw, Seth, Qian, Jiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530957/
https://www.ncbi.nlm.nih.gov/pubmed/28750606
http://dx.doi.org/10.1186/s12859-017-1769-7
_version_ 1783253325699874816
author Liu, Sheng
Zibetti, Cristina
Wan, Jun
Wang, Guohua
Blackshaw, Seth
Qian, Jiang
author_facet Liu, Sheng
Zibetti, Cristina
Wan, Jun
Wang, Guohua
Blackshaw, Seth
Qian, Jiang
author_sort Liu, Sheng
collection PubMed
description BACKGROUND: Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. RESULTS: We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. CONCLUSION: Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1769-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5530957
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55309572017-08-02 Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility Liu, Sheng Zibetti, Cristina Wan, Jun Wang, Guohua Blackshaw, Seth Qian, Jiang BMC Bioinformatics Research Article BACKGROUND: Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. RESULTS: We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. CONCLUSION: Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1769-7) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-27 /pmc/articles/PMC5530957/ /pubmed/28750606 http://dx.doi.org/10.1186/s12859-017-1769-7 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Liu, Sheng
Zibetti, Cristina
Wan, Jun
Wang, Guohua
Blackshaw, Seth
Qian, Jiang
Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_full Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_fullStr Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_full_unstemmed Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_short Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
title_sort assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530957/
https://www.ncbi.nlm.nih.gov/pubmed/28750606
http://dx.doi.org/10.1186/s12859-017-1769-7
work_keys_str_mv AT liusheng assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT zibetticristina assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT wanjun assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT wangguohua assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT blackshawseth assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility
AT qianjiang assessingthemodeltransferabilityforpredictionoftranscriptionfactorbindingsitesbasedonchromatinaccessibility