Cargando…

An efficient algorithm for improving structure-based prediction of transcription factor binding sites

BACKGROUND: Gene expression is regulated by transcription factors binding to specific target DNA sites. Understanding how and where transcription factors bind at genome scale represents an essential step toward our understanding of gene regulation networks. Previously we developed a structure-based...

Descripción completa

Detalles Bibliográficos
Autores principales: Farrel, Alvin, Guo, Jun-tao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5514533/
https://www.ncbi.nlm.nih.gov/pubmed/28715997
http://dx.doi.org/10.1186/s12859-017-1755-0
_version_ 1783250859032838144
author Farrel, Alvin
Guo, Jun-tao
author_facet Farrel, Alvin
Guo, Jun-tao
author_sort Farrel, Alvin
collection PubMed
description BACKGROUND: Gene expression is regulated by transcription factors binding to specific target DNA sites. Understanding how and where transcription factors bind at genome scale represents an essential step toward our understanding of gene regulation networks. Previously we developed a structure-based method for prediction of transcription factor binding sites using an integrative energy function that combines a knowledge-based multibody potential and two atomic energy terms. While the method performs well, it is not computationally efficient due to the exponential increase in the number of binding sequences to be evaluated for longer binding sites. In this paper, we present an efficient pentamer algorithm by splitting DNA binding sequences into overlapping fragments along with a simplified integrative energy function for transcription factor binding site prediction. RESULTS: A DNA binding sequence is split into overlapping pentamers (5 base pairs) for calculating transcription factor-pentamer interaction energy. To combine the results from overlapping pentamer scores, we developed two methods, Kmer-Sum and PWM (Position Weight Matrix) stacking, for full-length binding motif prediction. Our results show that both Kmer-Sum and PWM stacking in the new pentamer approach along with a simplified integrative energy function improved transcription factor binding site prediction accuracy and dramatically reduced computation time, especially for longer binding sites. CONCLUSION: Our new fragment-based pentamer algorithm and simplified energy function improve both efficiency and accuracy. To our knowledge, this is the first fragment-based method for structure-based transcription factor binding sites prediction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1755-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5514533
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55145332017-07-19 An efficient algorithm for improving structure-based prediction of transcription factor binding sites Farrel, Alvin Guo, Jun-tao BMC Bioinformatics Research Article BACKGROUND: Gene expression is regulated by transcription factors binding to specific target DNA sites. Understanding how and where transcription factors bind at genome scale represents an essential step toward our understanding of gene regulation networks. Previously we developed a structure-based method for prediction of transcription factor binding sites using an integrative energy function that combines a knowledge-based multibody potential and two atomic energy terms. While the method performs well, it is not computationally efficient due to the exponential increase in the number of binding sequences to be evaluated for longer binding sites. In this paper, we present an efficient pentamer algorithm by splitting DNA binding sequences into overlapping fragments along with a simplified integrative energy function for transcription factor binding site prediction. RESULTS: A DNA binding sequence is split into overlapping pentamers (5 base pairs) for calculating transcription factor-pentamer interaction energy. To combine the results from overlapping pentamer scores, we developed two methods, Kmer-Sum and PWM (Position Weight Matrix) stacking, for full-length binding motif prediction. Our results show that both Kmer-Sum and PWM stacking in the new pentamer approach along with a simplified integrative energy function improved transcription factor binding site prediction accuracy and dramatically reduced computation time, especially for longer binding sites. CONCLUSION: Our new fragment-based pentamer algorithm and simplified energy function improve both efficiency and accuracy. To our knowledge, this is the first fragment-based method for structure-based transcription factor binding sites prediction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1755-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-17 /pmc/articles/PMC5514533/ /pubmed/28715997 http://dx.doi.org/10.1186/s12859-017-1755-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Farrel, Alvin
Guo, Jun-tao
An efficient algorithm for improving structure-based prediction of transcription factor binding sites
title An efficient algorithm for improving structure-based prediction of transcription factor binding sites
title_full An efficient algorithm for improving structure-based prediction of transcription factor binding sites
title_fullStr An efficient algorithm for improving structure-based prediction of transcription factor binding sites
title_full_unstemmed An efficient algorithm for improving structure-based prediction of transcription factor binding sites
title_short An efficient algorithm for improving structure-based prediction of transcription factor binding sites
title_sort efficient algorithm for improving structure-based prediction of transcription factor binding sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5514533/
https://www.ncbi.nlm.nih.gov/pubmed/28715997
http://dx.doi.org/10.1186/s12859-017-1755-0
work_keys_str_mv AT farrelalvin anefficientalgorithmforimprovingstructurebasedpredictionoftranscriptionfactorbindingsites
AT guojuntao anefficientalgorithmforimprovingstructurebasedpredictionoftranscriptionfactorbindingsites
AT farrelalvin efficientalgorithmforimprovingstructurebasedpredictionoftranscriptionfactorbindingsites
AT guojuntao efficientalgorithmforimprovingstructurebasedpredictionoftranscriptionfactorbindingsites