Cargando…

Discovery of protein–DNA interactions by penalized multivariate regression

Discovering which regulatory proteins, especially transcription factors (TFs), are active under certain experimental conditions and identifying the corresponding binding motifs is essential for understanding the regulatory circuits that control cellular programs. The experimental methods used for th...

Descripción completa

Detalles Bibliográficos
Autores principales: Zamdborg, Leonid, Ma, Ping
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760818/
https://www.ncbi.nlm.nih.gov/pubmed/19578060
http://dx.doi.org/10.1093/nar/gkp554
_version_ 1782172787786383360
author Zamdborg, Leonid
Ma, Ping
author_facet Zamdborg, Leonid
Ma, Ping
author_sort Zamdborg, Leonid
collection PubMed
description Discovering which regulatory proteins, especially transcription factors (TFs), are active under certain experimental conditions and identifying the corresponding binding motifs is essential for understanding the regulatory circuits that control cellular programs. The experimental methods used for this purpose are laborious. Computational methods have been proven extremely effective in identifying TF-binding motifs (TFBMs). In this article, we propose a novel computational method called MotifExpress for discovering active TFBMs. Unlike existing methods, which either use only DNA sequence information or integrate sequence information with a single-sample measurement of gene expression, MotifExpress integrates DNA sequence information with gene expression measured in multiple samples. By selecting TFBMs that are significantly associated with gene expression, we can identify active TFBMs under specific experimental conditions and thus provide clues for the construction of regulatory networks. Compared with existing methods, MotifExpress substantially reduces the number of spurious results. Statistically, MotifExpress uses a penalized multivariate regression approach with a composite absolute penalty, which is highly stable and can effectively find the globally optimal set of active motifs. We demonstrate the excellent performance of MotifExpress by applying it to synthetic data and real examples of Saccharomyces cerevisiae. MotifExpress is available at http://www.stat.illinois.edu/~pingma/MotifExpress.htm.
format Text
id pubmed-2760818
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27608182009-10-13 Discovery of protein–DNA interactions by penalized multivariate regression Zamdborg, Leonid Ma, Ping Nucleic Acids Res Computational Biology Discovering which regulatory proteins, especially transcription factors (TFs), are active under certain experimental conditions and identifying the corresponding binding motifs is essential for understanding the regulatory circuits that control cellular programs. The experimental methods used for this purpose are laborious. Computational methods have been proven extremely effective in identifying TF-binding motifs (TFBMs). In this article, we propose a novel computational method called MotifExpress for discovering active TFBMs. Unlike existing methods, which either use only DNA sequence information or integrate sequence information with a single-sample measurement of gene expression, MotifExpress integrates DNA sequence information with gene expression measured in multiple samples. By selecting TFBMs that are significantly associated with gene expression, we can identify active TFBMs under specific experimental conditions and thus provide clues for the construction of regulatory networks. Compared with existing methods, MotifExpress substantially reduces the number of spurious results. Statistically, MotifExpress uses a penalized multivariate regression approach with a composite absolute penalty, which is highly stable and can effectively find the globally optimal set of active motifs. We demonstrate the excellent performance of MotifExpress by applying it to synthetic data and real examples of Saccharomyces cerevisiae. MotifExpress is available at http://www.stat.illinois.edu/~pingma/MotifExpress.htm. Oxford University Press 2009-09 2009-07-03 /pmc/articles/PMC2760818/ /pubmed/19578060 http://dx.doi.org/10.1093/nar/gkp554 Text en © 2009 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Zamdborg, Leonid
Ma, Ping
Discovery of protein–DNA interactions by penalized multivariate regression
title Discovery of protein–DNA interactions by penalized multivariate regression
title_full Discovery of protein–DNA interactions by penalized multivariate regression
title_fullStr Discovery of protein–DNA interactions by penalized multivariate regression
title_full_unstemmed Discovery of protein–DNA interactions by penalized multivariate regression
title_short Discovery of protein–DNA interactions by penalized multivariate regression
title_sort discovery of protein–dna interactions by penalized multivariate regression
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760818/
https://www.ncbi.nlm.nih.gov/pubmed/19578060
http://dx.doi.org/10.1093/nar/gkp554
work_keys_str_mv AT zamdborgleonid discoveryofproteindnainteractionsbypenalizedmultivariateregression
AT maping discoveryofproteindnainteractionsbypenalizedmultivariateregression