Cargando…

Assessing phylogenetic motif models for predicting transcription factor binding sites

Motivation: A variety of algorithms have been developed to predict transcription factor binding sites (TFBSs) within the genome by exploiting the evolutionary information implicit in multiple alignments of the genomes of related species. One such approach uses an extension of the standard position-s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hawkins, John, Grant, Charles, Noble, William Stafford, Bailey, Timothy L.
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2009
Materias:	Ismb/Eccb 2009 Conference Proceedings June 27 to July 2, 2009, Stockholm, Sweden
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2687955/ https://www.ncbi.nlm.nih.gov/pubmed/19478008 http://dx.doi.org/10.1093/bioinformatics/btp201

_version_	1782167628947652608
author	Hawkins, John Grant, Charles Noble, William Stafford Bailey, Timothy L.
author_facet	Hawkins, John Grant, Charles Noble, William Stafford Bailey, Timothy L.
author_sort	Hawkins, John
collection	PubMed
description	Motivation: A variety of algorithms have been developed to predict transcription factor binding sites (TFBSs) within the genome by exploiting the evolutionary information implicit in multiple alignments of the genomes of related species. One such approach uses an extension of the standard position-specific motif model that incorporates phylogenetic information via a phylogenetic tree and a model of evolution. However, these phylogenetic motif models (PMMs) have never been rigorously benchmarked in order to determine whether they lead to better prediction of TFBSs than obtained using simple position weight matrix scanning. Results: We evaluate three PMM-based prediction algorithms, each of which uses a different treatment of gapped alignments, and we compare their prediction accuracy with that of a non-phylogenetic motif scanning approach. Surprisingly, all of these algorithms appear to be inferior to simple motif scanning, when accuracy is measured using a gold standard of validated yeast TFBSs. However, the PMM scanners perform much better than simple motif scanning when we abandon the gold standard and consider the number of statistically significant sites predicted, using column-shuffled ‘random’ motifs to measure significance. These results suggest that the common practice of measuring the accuracy of binding site predictors using collections of known sites may be dangerously misleading since such collections may be missing ‘weak’ sites, which are exactly the type of sites needed to discriminate among predictors. We then extend our previous theoretical model of the statistical power of PMM-based prediction algorithms to allow for loss of binding sites during evolution, and show that it gives a more accurate upper bound on scanner accuracy. Finally, utilizing our theoretical model, we introduce a new method for predicting the number of real binding sites in a genome. The results suggest that the number of true sites for a yeast TF is in general several times greater than the number of known sites listed in the Saccharomyces cerevisiae Database (SCPD). Among the three scanning algorithms that we test, the MONKEY algorithm has the highest accuracy for predicting yeast TFBSs. Contact: j.hawkins@imb.uq.edu.au
format	Text
id	pubmed-2687955
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-26879552009-06-02 Assessing phylogenetic motif models for predicting transcription factor binding sites Hawkins, John Grant, Charles Noble, William Stafford Bailey, Timothy L. Bioinformatics Ismb/Eccb 2009 Conference Proceedings June 27 to July 2, 2009, Stockholm, Sweden Motivation: A variety of algorithms have been developed to predict transcription factor binding sites (TFBSs) within the genome by exploiting the evolutionary information implicit in multiple alignments of the genomes of related species. One such approach uses an extension of the standard position-specific motif model that incorporates phylogenetic information via a phylogenetic tree and a model of evolution. However, these phylogenetic motif models (PMMs) have never been rigorously benchmarked in order to determine whether they lead to better prediction of TFBSs than obtained using simple position weight matrix scanning. Results: We evaluate three PMM-based prediction algorithms, each of which uses a different treatment of gapped alignments, and we compare their prediction accuracy with that of a non-phylogenetic motif scanning approach. Surprisingly, all of these algorithms appear to be inferior to simple motif scanning, when accuracy is measured using a gold standard of validated yeast TFBSs. However, the PMM scanners perform much better than simple motif scanning when we abandon the gold standard and consider the number of statistically significant sites predicted, using column-shuffled ‘random’ motifs to measure significance. These results suggest that the common practice of measuring the accuracy of binding site predictors using collections of known sites may be dangerously misleading since such collections may be missing ‘weak’ sites, which are exactly the type of sites needed to discriminate among predictors. We then extend our previous theoretical model of the statistical power of PMM-based prediction algorithms to allow for loss of binding sites during evolution, and show that it gives a more accurate upper bound on scanner accuracy. Finally, utilizing our theoretical model, we introduce a new method for predicting the number of real binding sites in a genome. The results suggest that the number of true sites for a yeast TF is in general several times greater than the number of known sites listed in the Saccharomyces cerevisiae Database (SCPD). Among the three scanning algorithms that we test, the MONKEY algorithm has the highest accuracy for predicting yeast TFBSs. Contact: j.hawkins@imb.uq.edu.au Oxford University Press 2009-06-15 2009-05-27 /pmc/articles/PMC2687955/ /pubmed/19478008 http://dx.doi.org/10.1093/bioinformatics/btp201 Text en © 2009 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Ismb/Eccb 2009 Conference Proceedings June 27 to July 2, 2009, Stockholm, Sweden Hawkins, John Grant, Charles Noble, William Stafford Bailey, Timothy L. Assessing phylogenetic motif models for predicting transcription factor binding sites
title	Assessing phylogenetic motif models for predicting transcription factor binding sites
title_full	Assessing phylogenetic motif models for predicting transcription factor binding sites
title_fullStr	Assessing phylogenetic motif models for predicting transcription factor binding sites
title_full_unstemmed	Assessing phylogenetic motif models for predicting transcription factor binding sites
title_short	Assessing phylogenetic motif models for predicting transcription factor binding sites
title_sort	assessing phylogenetic motif models for predicting transcription factor binding sites
topic	Ismb/Eccb 2009 Conference Proceedings June 27 to July 2, 2009, Stockholm, Sweden
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2687955/ https://www.ncbi.nlm.nih.gov/pubmed/19478008 http://dx.doi.org/10.1093/bioinformatics/btp201
work_keys_str_mv	AT hawkinsjohn assessingphylogeneticmotifmodelsforpredictingtranscriptionfactorbindingsites AT grantcharles assessingphylogeneticmotifmodelsforpredictingtranscriptionfactorbindingsites AT noblewilliamstafford assessingphylogeneticmotifmodelsforpredictingtranscriptionfactorbindingsites AT baileytimothyl assessingphylogeneticmotifmodelsforpredictingtranscriptionfactorbindingsites

Assessing phylogenetic motif models for predicting transcription factor binding sites

Ejemplares similares