Cargando…

Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites

The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models,...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsukanov, A.V., Levitsky, V.G., Merkulova, T.I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Federal Research Center Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8408018/
https://www.ncbi.nlm.nih.gov/pubmed/34547062
http://dx.doi.org/10.18699/VJ21.002
_version_ 1783746737832198144
author Tsukanov, A.V.
Levitsky, V.G.
Merkulova, T.I.
author_facet Tsukanov, A.V.
Levitsky, V.G.
Merkulova, T.I.
author_sort Tsukanov, A.V.
collection PubMed
description The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study.
format Online
Article
Text
id pubmed-8408018
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher The Federal Research Center Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-84080182021-09-17 Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites Tsukanov, A.V. Levitsky, V.G. Merkulova, T.I. Vavilovskii Zhurnal Genet Selektsii / Original article The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study. The Federal Research Center Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences 2021-02 /pmc/articles/PMC8408018/ /pubmed/34547062 http://dx.doi.org/10.18699/VJ21.002 Text en Copyright © AUTHORS https://creativecommons.org/licenses/by/2.5/This work is licensed under a Creative Commons Attribution 4.0 License
spellingShingle / Original article
Tsukanov, A.V.
Levitsky, V.G.
Merkulova, T.I.
Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_full Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_fullStr Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_full_unstemmed Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_short Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites
title_sort application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of foxa2 binding sites
topic / Original article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8408018/
https://www.ncbi.nlm.nih.gov/pubmed/34547062
http://dx.doi.org/10.18699/VJ21.002
work_keys_str_mv AT tsukanovav applicationofalternativedenovomotifrecognitionmodelsforanalysisofstructuralheterogeneityoftranscriptionfactorbindingsitesacasestudyoffoxa2bindingsites
AT levitskyvg applicationofalternativedenovomotifrecognitionmodelsforanalysisofstructuralheterogeneityoftranscriptionfactorbindingsitesacasestudyoffoxa2bindingsites
AT merkulovati applicationofalternativedenovomotifrecognitionmodelsforanalysisofstructuralheterogeneityoftranscriptionfactorbindingsitesacasestudyoffoxa2bindingsites