Cargando…

Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles

Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predi...

Descripción completa

Detalles Bibliográficos
Autores principales: Bi, Yingtao, Kim, Hyunsoo, Gupta, Ravi, Davuluri, Ramana V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166302/
https://www.ncbi.nlm.nih.gov/pubmed/21912677
http://dx.doi.org/10.1371/journal.pone.0024210
_version_ 1782211146729652224
author Bi, Yingtao
Kim, Hyunsoo
Gupta, Ravi
Davuluri, Ramana V.
author_facet Bi, Yingtao
Kim, Hyunsoo
Gupta, Ravi
Davuluri, Ramana V.
author_sort Bi, Yingtao
collection PubMed
description Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predictions. The increasing availability of TF enrichment profiles from recent ChIP-Seq methodology facilitates the investigation of dependent structure and accurate prediction of TFBSs. We develop a novel Tree-based PWM (TPWM) approach to accurately model the interaction between TF and its binding site. The whole tree-structured PWM could be considered as a mixture of different conditional-PWMs. We propose a discriminative approach, called TPD (TPWM based Discriminative Approach), to construct the TPWM from the ChIP-Seq data with a pre-existing PWM. To achieve the maximum discriminative power between the positive and negative datasets, the cutoff value is determined based on the Matthew Correlation Coefficient (MCC). The resulting TPWMs are evaluated with respect to accuracy on extensive synthetic datasets. We then apply our TPWM discriminative approach on several real ChIP-Seq datasets to refine the current TFBS models stored in the TRANSFAC database. Experiments on both the simulated and real ChIP-Seq data show that the proposed method starting from existing PWM has consistently better performance than existing tools in detecting the TFBSs. The improved accuracy is the result of modelling the complete dependent structure of the motifs and better prediction of true positive rate. The findings could lead to better understanding of the mechanisms of TF-DNA interactions.
format Online
Article
Text
id pubmed-3166302
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31663022011-09-12 Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles Bi, Yingtao Kim, Hyunsoo Gupta, Ravi Davuluri, Ramana V. PLoS One Research Article Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predictions. The increasing availability of TF enrichment profiles from recent ChIP-Seq methodology facilitates the investigation of dependent structure and accurate prediction of TFBSs. We develop a novel Tree-based PWM (TPWM) approach to accurately model the interaction between TF and its binding site. The whole tree-structured PWM could be considered as a mixture of different conditional-PWMs. We propose a discriminative approach, called TPD (TPWM based Discriminative Approach), to construct the TPWM from the ChIP-Seq data with a pre-existing PWM. To achieve the maximum discriminative power between the positive and negative datasets, the cutoff value is determined based on the Matthew Correlation Coefficient (MCC). The resulting TPWMs are evaluated with respect to accuracy on extensive synthetic datasets. We then apply our TPWM discriminative approach on several real ChIP-Seq datasets to refine the current TFBS models stored in the TRANSFAC database. Experiments on both the simulated and real ChIP-Seq data show that the proposed method starting from existing PWM has consistently better performance than existing tools in detecting the TFBSs. The improved accuracy is the result of modelling the complete dependent structure of the motifs and better prediction of true positive rate. The findings could lead to better understanding of the mechanisms of TF-DNA interactions. Public Library of Science 2011-09-02 /pmc/articles/PMC3166302/ /pubmed/21912677 http://dx.doi.org/10.1371/journal.pone.0024210 Text en Bi et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bi, Yingtao
Kim, Hyunsoo
Gupta, Ravi
Davuluri, Ramana V.
Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles
title Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles
title_full Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles
title_fullStr Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles
title_full_unstemmed Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles
title_short Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles
title_sort tree-based position weight matrix approach to model transcription factor binding site profiles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166302/
https://www.ncbi.nlm.nih.gov/pubmed/21912677
http://dx.doi.org/10.1371/journal.pone.0024210
work_keys_str_mv AT biyingtao treebasedpositionweightmatrixapproachtomodeltranscriptionfactorbindingsiteprofiles
AT kimhyunsoo treebasedpositionweightmatrixapproachtomodeltranscriptionfactorbindingsiteprofiles
AT guptaravi treebasedpositionweightmatrixapproachtomodeltranscriptionfactorbindingsiteprofiles
AT davuluriramanav treebasedpositionweightmatrixapproachtomodeltranscriptionfactorbindingsiteprofiles