Cargando…

Recognition models to predict DNA-binding specificities of homeodomain proteins

Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constru...

Descripción completa

Detalles Bibliográficos
Autores principales: Christensen, Ryan G., Enuameh, Metewo Selase, Noyes, Marcus B., Brodsky, Michael H., Wolfe, Scot A., Stormo, Gary D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371834/
https://www.ncbi.nlm.nih.gov/pubmed/22689783
http://dx.doi.org/10.1093/bioinformatics/bts202
_version_ 1782235266779447296
author Christensen, Ryan G.
Enuameh, Metewo Selase
Noyes, Marcus B.
Brodsky, Michael H.
Wolfe, Scot A.
Stormo, Gary D.
author_facet Christensen, Ryan G.
Enuameh, Metewo Selase
Noyes, Marcus B.
Brodsky, Michael H.
Wolfe, Scot A.
Stormo, Gary D.
author_sort Christensen, Ryan G.
collection PubMed
description Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C(2)H(2) zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes. Results: Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model. Contact: stormo@wustl.edu
format Online
Article
Text
id pubmed-3371834
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33718342012-06-11 Recognition models to predict DNA-binding specificities of homeodomain proteins Christensen, Ryan G. Enuameh, Metewo Selase Noyes, Marcus B. Brodsky, Michael H. Wolfe, Scot A. Stormo, Gary D. Bioinformatics Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C(2)H(2) zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes. Results: Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model. Contact: stormo@wustl.edu Oxford University Press 2012-06-15 2012-06-09 /pmc/articles/PMC3371834/ /pubmed/22689783 http://dx.doi.org/10.1093/bioinformatics/bts202 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa
Christensen, Ryan G.
Enuameh, Metewo Selase
Noyes, Marcus B.
Brodsky, Michael H.
Wolfe, Scot A.
Stormo, Gary D.
Recognition models to predict DNA-binding specificities of homeodomain proteins
title Recognition models to predict DNA-binding specificities of homeodomain proteins
title_full Recognition models to predict DNA-binding specificities of homeodomain proteins
title_fullStr Recognition models to predict DNA-binding specificities of homeodomain proteins
title_full_unstemmed Recognition models to predict DNA-binding specificities of homeodomain proteins
title_short Recognition models to predict DNA-binding specificities of homeodomain proteins
title_sort recognition models to predict dna-binding specificities of homeodomain proteins
topic Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371834/
https://www.ncbi.nlm.nih.gov/pubmed/22689783
http://dx.doi.org/10.1093/bioinformatics/bts202
work_keys_str_mv AT christensenryang recognitionmodelstopredictdnabindingspecificitiesofhomeodomainproteins
AT enuamehmetewoselase recognitionmodelstopredictdnabindingspecificitiesofhomeodomainproteins
AT noyesmarcusb recognitionmodelstopredictdnabindingspecificitiesofhomeodomainproteins
AT brodskymichaelh recognitionmodelstopredictdnabindingspecificitiesofhomeodomainproteins
AT wolfescota recognitionmodelstopredictdnabindingspecificitiesofhomeodomainproteins
AT stormogaryd recognitionmodelstopredictdnabindingspecificitiesofhomeodomainproteins