Cargando…

A novel method for improved accuracy of transcription factor binding site prediction

Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcr...

Descripción completa

Detalles Bibliográficos
Autores principales: Khamis, Abdullah M, Motwalli, Olaa, Oliva, Romina, Jankovic, Boris R, Medvedeva, Yulia A, Ashoor, Haitham, Essack, Magbubah, Gao, Xin, Bajic, Vladimir B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037060/
https://www.ncbi.nlm.nih.gov/pubmed/29617876
http://dx.doi.org/10.1093/nar/gky237
_version_ 1783338272750043136
author Khamis, Abdullah M
Motwalli, Olaa
Oliva, Romina
Jankovic, Boris R
Medvedeva, Yulia A
Ashoor, Haitham
Essack, Magbubah
Gao, Xin
Bajic, Vladimir B
author_facet Khamis, Abdullah M
Motwalli, Olaa
Oliva, Romina
Jankovic, Boris R
Medvedeva, Yulia A
Ashoor, Haitham
Essack, Magbubah
Gao, Xin
Bajic, Vladimir B
author_sort Khamis, Abdullah M
collection PubMed
description Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.
format Online
Article
Text
id pubmed-6037060
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60370602018-07-12 A novel method for improved accuracy of transcription factor binding site prediction Khamis, Abdullah M Motwalli, Olaa Oliva, Romina Jankovic, Boris R Medvedeva, Yulia A Ashoor, Haitham Essack, Magbubah Gao, Xin Bajic, Vladimir B Nucleic Acids Res Methods Online Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF. Oxford University Press 2018-07-06 2018-04-02 /pmc/articles/PMC6037060/ /pubmed/29617876 http://dx.doi.org/10.1093/nar/gky237 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Khamis, Abdullah M
Motwalli, Olaa
Oliva, Romina
Jankovic, Boris R
Medvedeva, Yulia A
Ashoor, Haitham
Essack, Magbubah
Gao, Xin
Bajic, Vladimir B
A novel method for improved accuracy of transcription factor binding site prediction
title A novel method for improved accuracy of transcription factor binding site prediction
title_full A novel method for improved accuracy of transcription factor binding site prediction
title_fullStr A novel method for improved accuracy of transcription factor binding site prediction
title_full_unstemmed A novel method for improved accuracy of transcription factor binding site prediction
title_short A novel method for improved accuracy of transcription factor binding site prediction
title_sort novel method for improved accuracy of transcription factor binding site prediction
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6037060/
https://www.ncbi.nlm.nih.gov/pubmed/29617876
http://dx.doi.org/10.1093/nar/gky237
work_keys_str_mv AT khamisabdullahm anovelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT motwalliolaa anovelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT olivaromina anovelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT jankovicborisr anovelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT medvedevayuliaa anovelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT ashoorhaitham anovelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT essackmagbubah anovelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT gaoxin anovelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT bajicvladimirb anovelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT khamisabdullahm novelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT motwalliolaa novelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT olivaromina novelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT jankovicborisr novelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT medvedevayuliaa novelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT ashoorhaitham novelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT essackmagbubah novelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT gaoxin novelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction
AT bajicvladimirb novelmethodforimprovedaccuracyoftranscriptionfactorbindingsiteprediction