Cargando…

Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction

Prediction of transcription factor binding sites is an important challenge in genome analysis. The advent of next generation genome sequencing technologies makes the development of effective computational approaches particularly imperative. We have developed a novel training-based methodology intend...

Descripción completa

Detalles Bibliográficos
Autores principales: Salama, Rafik A., Stekel, Dov J.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2896541/
https://www.ncbi.nlm.nih.gov/pubmed/20439311
http://dx.doi.org/10.1093/nar/gkq274
_version_ 1782183363824582656
author Salama, Rafik A.
Stekel, Dov J.
author_facet Salama, Rafik A.
Stekel, Dov J.
author_sort Salama, Rafik A.
collection PubMed
description Prediction of transcription factor binding sites is an important challenge in genome analysis. The advent of next generation genome sequencing technologies makes the development of effective computational approaches particularly imperative. We have developed a novel training-based methodology intended for prokaryotic transcription factor binding site prediction. Our methodology extends existing models by taking into account base interdependencies between neighbouring positions using conditional probabilities and includes genomic background weighting. This has been tested against other existing and novel methodologies including position-specific weight matrices, first-order Hidden Markov Models and joint probability models. We have also tested the use of gapped and ungapped alignments and the inclusion or exclusion of background weighting. We show that our best method enhances binding site prediction for all of the 22 Escherichia coli transcription factors with at least 20 known binding sites, with many showing substantial improvements. We highlight the advantage of using block alignments of binding sites over gapped alignments to capture neighbouring position interdependencies. We also show that combining these methods with ChIP-on-chip data has the potential to further improve binding site prediction. Finally we have developed the ungapped likelihood under positional background platform: a user friendly website that gives access to the prediction method devised in this work.
format Text
id pubmed-2896541
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28965412010-07-06 Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction Salama, Rafik A. Stekel, Dov J. Nucleic Acids Res Methods Online Prediction of transcription factor binding sites is an important challenge in genome analysis. The advent of next generation genome sequencing technologies makes the development of effective computational approaches particularly imperative. We have developed a novel training-based methodology intended for prokaryotic transcription factor binding site prediction. Our methodology extends existing models by taking into account base interdependencies between neighbouring positions using conditional probabilities and includes genomic background weighting. This has been tested against other existing and novel methodologies including position-specific weight matrices, first-order Hidden Markov Models and joint probability models. We have also tested the use of gapped and ungapped alignments and the inclusion or exclusion of background weighting. We show that our best method enhances binding site prediction for all of the 22 Escherichia coli transcription factors with at least 20 known binding sites, with many showing substantial improvements. We highlight the advantage of using block alignments of binding sites over gapped alignments to capture neighbouring position interdependencies. We also show that combining these methods with ChIP-on-chip data has the potential to further improve binding site prediction. Finally we have developed the ungapped likelihood under positional background platform: a user friendly website that gives access to the prediction method devised in this work. Oxford University Press 2010-07 2010-05-03 /pmc/articles/PMC2896541/ /pubmed/20439311 http://dx.doi.org/10.1093/nar/gkq274 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Salama, Rafik A.
Stekel, Dov J.
Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction
title Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction
title_full Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction
title_fullStr Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction
title_full_unstemmed Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction
title_short Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction
title_sort inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2896541/
https://www.ncbi.nlm.nih.gov/pubmed/20439311
http://dx.doi.org/10.1093/nar/gkq274
work_keys_str_mv AT salamarafika inclusionofneighboringbaseinterdependenciessubstantiallyimprovesgenomewideprokaryotictranscriptionfactorbindingsiteprediction
AT stekeldovj inclusionofneighboringbaseinterdependenciessubstantiallyimprovesgenomewideprokaryotictranscriptionfactorbindingsiteprediction