Cargando…

Modeling binding specificities of transcription factor pairs with random forests

BACKGROUND: Transcription factors (TFs) bind regulatory DNA regions with sequence specificity, form complexes and regulate gene expression. In cooperative TF-TF binding, two transcription factors bind onto a shared DNA binding site as a pair. Previous work has demonstrated pairwise TF-TF-DNA interac...

Descripción completa

Detalles Bibliográficos
Autores principales: Antikainen, Anni A., Heinonen, Markus, Lähdesmäki, Harri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9166390/
https://www.ncbi.nlm.nih.gov/pubmed/35659235
http://dx.doi.org/10.1186/s12859-022-04734-7
_version_ 1784720592297000960
author Antikainen, Anni A.
Heinonen, Markus
Lähdesmäki, Harri
author_facet Antikainen, Anni A.
Heinonen, Markus
Lähdesmäki, Harri
author_sort Antikainen, Anni A.
collection PubMed
description BACKGROUND: Transcription factors (TFs) bind regulatory DNA regions with sequence specificity, form complexes and regulate gene expression. In cooperative TF-TF binding, two transcription factors bind onto a shared DNA binding site as a pair. Previous work has demonstrated pairwise TF-TF-DNA interactions with position weight matrices (PWMs), which may however not sufficiently take into account the complexity and flexibility of pairwise binding. RESULTS: We propose two random forest (RF) methods for joint TF-TF binding site prediction: ComBind and JointRF. We train models with previously published large-scale CAP-SELEX DNA libraries, which comprise DNA sequences enriched for binding of a selected TF pair. JointRF builds a random forest with sub-sequences selected from CAP-SELEX DNA reads with previously proposed pairwise PWM. JointRF outperforms (area under receiver operating characteristics curve, AUROC, 0.75) the current state-of-the-art method i.e. orientation and spacing specific pairwise PWMs (AUROC 0.59). Thus, JointRF may be utilized to improve prediction accuracy for pre-determined binding preferences. However, pairwise TF binding is currently considered flexible; a pair may bind DNA with different orientations and amounts of dinucleotide gaps or overlap between the two motifs. Thus, we developed ComBind, which utilizes random forests by considering simultaneously multiple orientations and spacings of the two factors. Our approach outperforms (AUROC 0.78) PWMs, as well as JointRF (p<0.00195). ComBind provides an approach for predicting TF-TF binding sites without prior knowledge on pairwise binding preferences. However, more research is needed to assess ComBind eligibility for practical applications. CONCLUSIONS: Random forest is well suited for modeling pairwise TF-TF-DNA binding specificities, and ComBind provides an improvement to pairwise binding site prediction accuracy. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04734-7.
format Online
Article
Text
id pubmed-9166390
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91663902022-06-05 Modeling binding specificities of transcription factor pairs with random forests Antikainen, Anni A. Heinonen, Markus Lähdesmäki, Harri BMC Bioinformatics Research BACKGROUND: Transcription factors (TFs) bind regulatory DNA regions with sequence specificity, form complexes and regulate gene expression. In cooperative TF-TF binding, two transcription factors bind onto a shared DNA binding site as a pair. Previous work has demonstrated pairwise TF-TF-DNA interactions with position weight matrices (PWMs), which may however not sufficiently take into account the complexity and flexibility of pairwise binding. RESULTS: We propose two random forest (RF) methods for joint TF-TF binding site prediction: ComBind and JointRF. We train models with previously published large-scale CAP-SELEX DNA libraries, which comprise DNA sequences enriched for binding of a selected TF pair. JointRF builds a random forest with sub-sequences selected from CAP-SELEX DNA reads with previously proposed pairwise PWM. JointRF outperforms (area under receiver operating characteristics curve, AUROC, 0.75) the current state-of-the-art method i.e. orientation and spacing specific pairwise PWMs (AUROC 0.59). Thus, JointRF may be utilized to improve prediction accuracy for pre-determined binding preferences. However, pairwise TF binding is currently considered flexible; a pair may bind DNA with different orientations and amounts of dinucleotide gaps or overlap between the two motifs. Thus, we developed ComBind, which utilizes random forests by considering simultaneously multiple orientations and spacings of the two factors. Our approach outperforms (AUROC 0.78) PWMs, as well as JointRF (p<0.00195). ComBind provides an approach for predicting TF-TF binding sites without prior knowledge on pairwise binding preferences. However, more research is needed to assess ComBind eligibility for practical applications. CONCLUSIONS: Random forest is well suited for modeling pairwise TF-TF-DNA binding specificities, and ComBind provides an improvement to pairwise binding site prediction accuracy. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04734-7. BioMed Central 2022-06-03 /pmc/articles/PMC9166390/ /pubmed/35659235 http://dx.doi.org/10.1186/s12859-022-04734-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Antikainen, Anni A.
Heinonen, Markus
Lähdesmäki, Harri
Modeling binding specificities of transcription factor pairs with random forests
title Modeling binding specificities of transcription factor pairs with random forests
title_full Modeling binding specificities of transcription factor pairs with random forests
title_fullStr Modeling binding specificities of transcription factor pairs with random forests
title_full_unstemmed Modeling binding specificities of transcription factor pairs with random forests
title_short Modeling binding specificities of transcription factor pairs with random forests
title_sort modeling binding specificities of transcription factor pairs with random forests
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9166390/
https://www.ncbi.nlm.nih.gov/pubmed/35659235
http://dx.doi.org/10.1186/s12859-022-04734-7
work_keys_str_mv AT antikainenannia modelingbindingspecificitiesoftranscriptionfactorpairswithrandomforests
AT heinonenmarkus modelingbindingspecificitiesoftranscriptionfactorpairswithrandomforests
AT lahdesmakiharri modelingbindingspecificitiesoftranscriptionfactorpairswithrandomforests