Cargando…

EMQIT: a machine learning approach for energy based PWM matrix quality improvement

BACKGROUND: Transcription factor binding affinities to DNA play a key role for the gene regulation. Learning the specificity of the mechanisms of binding TFs to DNA is important both to experimentalists and theoreticians. With the development of high-throughput methods such as, e.g., ChiP-seq the ne...

Descripción completa

Detalles Bibliográficos
Autores principales: Smolinska, Karolina, Pacholczyk, Marcin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5539975/
https://www.ncbi.nlm.nih.gov/pubmed/28764727
http://dx.doi.org/10.1186/s13062-017-0189-y
_version_ 1783254572329861120
author Smolinska, Karolina
Pacholczyk, Marcin
author_facet Smolinska, Karolina
Pacholczyk, Marcin
author_sort Smolinska, Karolina
collection PubMed
description BACKGROUND: Transcription factor binding affinities to DNA play a key role for the gene regulation. Learning the specificity of the mechanisms of binding TFs to DNA is important both to experimentalists and theoreticians. With the development of high-throughput methods such as, e.g., ChiP-seq the need to provide unbiased models of binding events has been made apparent. We present EMQIT a modification to the approach introduced by Alamanova et al. and later implemented as 3DTF server. We observed that tuning of Boltzmann factor weights, used for conversion of calculated energies to nucleotide probabilities, has a significant impact on the quality of the associated PWM matrix. RESULTS: Consequently, we proposed to use receiver operator characteristics curves and the 10-fold cross-validation to learn best weights using experimentally verified data from TRANSFAC database. We applied our method to data available for various TFs. We verified the efficiency of detecting TF binding sites by the 3DTF matrices improved with our technique using experimental data from the TRANSFAC database. The comparison showed a significant similarity and comparable performance between the improved and the experimental matrices (TRANSFAC). Improved 3DTF matrices achieved significantly higher AUC values than the original 3DTF matrices (at least by 0.1) and, at the same time, detected notably more experimentally verified TFBSs. CONCLUSIONS: The resulting new improved PWM matrices for analyzed factors show similarity to TRANSFAC matrices. Matrices had comparable predictive capabilities. Moreover, improved PWMs achieve better results than matrices downloaded from 3DTF server. Presented approach is general and applicable to any energy-based matrices. EMQIT is available online at http://biosolvers.polsl.pl:3838/emqit. REVIEWERS: This article was reviewed by Oliviero Carugo, Marek Kimmel and István Simon. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-017-0189-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5539975
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55399752017-08-03 EMQIT: a machine learning approach for energy based PWM matrix quality improvement Smolinska, Karolina Pacholczyk, Marcin Biol Direct Research BACKGROUND: Transcription factor binding affinities to DNA play a key role for the gene regulation. Learning the specificity of the mechanisms of binding TFs to DNA is important both to experimentalists and theoreticians. With the development of high-throughput methods such as, e.g., ChiP-seq the need to provide unbiased models of binding events has been made apparent. We present EMQIT a modification to the approach introduced by Alamanova et al. and later implemented as 3DTF server. We observed that tuning of Boltzmann factor weights, used for conversion of calculated energies to nucleotide probabilities, has a significant impact on the quality of the associated PWM matrix. RESULTS: Consequently, we proposed to use receiver operator characteristics curves and the 10-fold cross-validation to learn best weights using experimentally verified data from TRANSFAC database. We applied our method to data available for various TFs. We verified the efficiency of detecting TF binding sites by the 3DTF matrices improved with our technique using experimental data from the TRANSFAC database. The comparison showed a significant similarity and comparable performance between the improved and the experimental matrices (TRANSFAC). Improved 3DTF matrices achieved significantly higher AUC values than the original 3DTF matrices (at least by 0.1) and, at the same time, detected notably more experimentally verified TFBSs. CONCLUSIONS: The resulting new improved PWM matrices for analyzed factors show similarity to TRANSFAC matrices. Matrices had comparable predictive capabilities. Moreover, improved PWMs achieve better results than matrices downloaded from 3DTF server. Presented approach is general and applicable to any energy-based matrices. EMQIT is available online at http://biosolvers.polsl.pl:3838/emqit. REVIEWERS: This article was reviewed by Oliviero Carugo, Marek Kimmel and István Simon. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13062-017-0189-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-01 /pmc/articles/PMC5539975/ /pubmed/28764727 http://dx.doi.org/10.1186/s13062-017-0189-y Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Smolinska, Karolina
Pacholczyk, Marcin
EMQIT: a machine learning approach for energy based PWM matrix quality improvement
title EMQIT: a machine learning approach for energy based PWM matrix quality improvement
title_full EMQIT: a machine learning approach for energy based PWM matrix quality improvement
title_fullStr EMQIT: a machine learning approach for energy based PWM matrix quality improvement
title_full_unstemmed EMQIT: a machine learning approach for energy based PWM matrix quality improvement
title_short EMQIT: a machine learning approach for energy based PWM matrix quality improvement
title_sort emqit: a machine learning approach for energy based pwm matrix quality improvement
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5539975/
https://www.ncbi.nlm.nih.gov/pubmed/28764727
http://dx.doi.org/10.1186/s13062-017-0189-y
work_keys_str_mv AT smolinskakarolina emqitamachinelearningapproachforenergybasedpwmmatrixqualityimprovement
AT pacholczykmarcin emqitamachinelearningapproachforenergybasedpwmmatrixqualityimprovement