Cargando…
Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models,...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Federal Research Center Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8408018/ https://www.ncbi.nlm.nih.gov/pubmed/34547062 http://dx.doi.org/10.18699/VJ21.002 |
_version_ | 1783746737832198144 |
---|---|
author | Tsukanov, A.V. Levitsky, V.G. Merkulova, T.I. |
author_facet | Tsukanov, A.V. Levitsky, V.G. Merkulova, T.I. |
author_sort | Tsukanov, A.V. |
collection | PubMed |
description | The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study. |
format | Online Article Text |
id | pubmed-8408018 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | The Federal Research Center Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-84080182021-09-17 Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites Tsukanov, A.V. Levitsky, V.G. Merkulova, T.I. Vavilovskii Zhurnal Genet Selektsii / Original article The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study. The Federal Research Center Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences 2021-02 /pmc/articles/PMC8408018/ /pubmed/34547062 http://dx.doi.org/10.18699/VJ21.002 Text en Copyright © AUTHORS https://creativecommons.org/licenses/by/2.5/This work is licensed under a Creative Commons Attribution 4.0 License |
spellingShingle | / Original article Tsukanov, A.V. Levitsky, V.G. Merkulova, T.I. Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites |
title | Application of alternative de novo motif recognition models
for analysis of structural heterogeneity of transcription factor
binding sites: a case study of FOXA2 binding sites |
title_full | Application of alternative de novo motif recognition models
for analysis of structural heterogeneity of transcription factor
binding sites: a case study of FOXA2 binding sites |
title_fullStr | Application of alternative de novo motif recognition models
for analysis of structural heterogeneity of transcription factor
binding sites: a case study of FOXA2 binding sites |
title_full_unstemmed | Application of alternative de novo motif recognition models
for analysis of structural heterogeneity of transcription factor
binding sites: a case study of FOXA2 binding sites |
title_short | Application of alternative de novo motif recognition models
for analysis of structural heterogeneity of transcription factor
binding sites: a case study of FOXA2 binding sites |
title_sort | application of alternative de novo motif recognition models
for analysis of structural heterogeneity of transcription factor
binding sites: a case study of foxa2 binding sites |
topic | / Original article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8408018/ https://www.ncbi.nlm.nih.gov/pubmed/34547062 http://dx.doi.org/10.18699/VJ21.002 |
work_keys_str_mv | AT tsukanovav applicationofalternativedenovomotifrecognitionmodelsforanalysisofstructuralheterogeneityoftranscriptionfactorbindingsitesacasestudyoffoxa2bindingsites AT levitskyvg applicationofalternativedenovomotifrecognitionmodelsforanalysisofstructuralheterogeneityoftranscriptionfactorbindingsitesacasestudyoffoxa2bindingsites AT merkulovati applicationofalternativedenovomotifrecognitionmodelsforanalysisofstructuralheterogeneityoftranscriptionfactorbindingsitesacasestudyoffoxa2bindingsites |