Cargando…

Variable structure motifs for transcription factor binding sites

BACKGROUND: Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of...

Descripción completa

Detalles Bibliográficos
Autores principales: Reid, John E, Evans, Kenneth J, Dyer, Nigel, Wernisch, Lorenz, Ott, Sascha
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824720/
https://www.ncbi.nlm.nih.gov/pubmed/20074339
http://dx.doi.org/10.1186/1471-2164-11-30
_version_ 1782177725811785728
author Reid, John E
Evans, Kenneth J
Dyer, Nigel
Wernisch, Lorenz
Ott, Sascha
author_facet Reid, John E
Evans, Kenneth J
Dyer, Nigel
Wernisch, Lorenz
Ott, Sascha
author_sort Reid, John E
collection PubMed
description BACKGROUND: Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets. RESULTS: We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance. CONCLUSIONS: We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.
format Text
id pubmed-2824720
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28247202010-02-20 Variable structure motifs for transcription factor binding sites Reid, John E Evans, Kenneth J Dyer, Nigel Wernisch, Lorenz Ott, Sascha BMC Genomics Research Article BACKGROUND: Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets. RESULTS: We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance. CONCLUSIONS: We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1. BioMed Central 2010-01-14 /pmc/articles/PMC2824720/ /pubmed/20074339 http://dx.doi.org/10.1186/1471-2164-11-30 Text en Copyright ©2010 Reid et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Reid, John E
Evans, Kenneth J
Dyer, Nigel
Wernisch, Lorenz
Ott, Sascha
Variable structure motifs for transcription factor binding sites
title Variable structure motifs for transcription factor binding sites
title_full Variable structure motifs for transcription factor binding sites
title_fullStr Variable structure motifs for transcription factor binding sites
title_full_unstemmed Variable structure motifs for transcription factor binding sites
title_short Variable structure motifs for transcription factor binding sites
title_sort variable structure motifs for transcription factor binding sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824720/
https://www.ncbi.nlm.nih.gov/pubmed/20074339
http://dx.doi.org/10.1186/1471-2164-11-30
work_keys_str_mv AT reidjohne variablestructuremotifsfortranscriptionfactorbindingsites
AT evanskennethj variablestructuremotifsfortranscriptionfactorbindingsites
AT dyernigel variablestructuremotifsfortranscriptionfactorbindingsites
AT wernischlorenz variablestructuremotifsfortranscriptionfactorbindingsites
AT ottsascha variablestructuremotifsfortranscriptionfactorbindingsites