Cargando…

CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data

The identification and characterization of binding sites of DNA-binding molecules, including transcription factors (TFs), is a critical problem at the interface of chemistry, biology and molecular medicine. The Cognate Site Identification (CSI) array is a high-throughput microarray platform for meas...

Descripción completa

Detalles Bibliográficos
Autores principales:	Keleş, Sündüz, Warren, Christopher L., Carlson, Clayton D., Ansari, Aseem Z.
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2008
Materias:	Computational Biology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2425502/ https://www.ncbi.nlm.nih.gov/pubmed/18411210 http://dx.doi.org/10.1093/nar/gkn057

_version_	1782156271330263040
author	Keleş, Sündüz Warren, Christopher L. Carlson, Clayton D. Ansari, Aseem Z.
author_facet	Keleş, Sündüz Warren, Christopher L. Carlson, Clayton D. Ansari, Aseem Z.
author_sort	Keleş, Sündüz
collection	PubMed
description	The identification and characterization of binding sites of DNA-binding molecules, including transcription factors (TFs), is a critical problem at the interface of chemistry, biology and molecular medicine. The Cognate Site Identification (CSI) array is a high-throughput microarray platform for measuring comprehensive recognition profiles of DNA-binding molecules. This technique produces datasets that are useful not only for identifying binding sites of previously uncharacterized TFs but also for elucidating dependencies, both local and nonlocal, between the nucleotides at different positions of the recognition sites. We have developed a regression tree technique, CSI-Tree, for exploring the spectrum of binding sites of DNA-binding molecules. Our approach constructs regression trees utilizing the CSI data of unaligned sequences. The resulting model partitions the binding spectrum into homogeneous regions of position specific nucleotide effects. Each homogeneous partition is then summarized by a position weight matrix (PWM). Hence, the final outcome is a binding intensity rank-ordered collection of PWMs each of which spans a different region in the binding spectrum. Nodes of the regression tree depict the critical position/nucleotide combinations. We analyze the CSI data of the eukaryotic TF Nkx-2.5 and two engineered small molecule DNA ligands and obtain unique insights into their binding properties. The CSI tree for Nkx-2.5 reveals an interaction between two positions of the binding profile and elucidates how different nucleotide combinations at these two positions lead to different binding affinities. The CSI trees for the engineered DNA ligands exhibit a common preference for the dinucleotide AA in the first two positions, which is consistent with preference for a narrow and relatively flat minor groove. We carry out a reanalysis of these data with a mixture of PWMs approach. This approach is an advancement over the simple PWM model and accommodates position dependencies based on only sequence data. Our analysis indicates that the dependencies revealed by the CSI-Tree are challenging to discover without the actual binding intensities. Moreover, such a mixture model is highly sensitive to the number and length of the sequences analyzed. In contrast, CSI-Tree provides interpretable and concise summaries of the complete recognition profiles of DNA-binding molecules by utilizing binding affinities.
format	Text
id	pubmed-2425502
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-24255022008-06-12 CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data Keleş, Sündüz Warren, Christopher L. Carlson, Clayton D. Ansari, Aseem Z. Nucleic Acids Res Computational Biology The identification and characterization of binding sites of DNA-binding molecules, including transcription factors (TFs), is a critical problem at the interface of chemistry, biology and molecular medicine. The Cognate Site Identification (CSI) array is a high-throughput microarray platform for measuring comprehensive recognition profiles of DNA-binding molecules. This technique produces datasets that are useful not only for identifying binding sites of previously uncharacterized TFs but also for elucidating dependencies, both local and nonlocal, between the nucleotides at different positions of the recognition sites. We have developed a regression tree technique, CSI-Tree, for exploring the spectrum of binding sites of DNA-binding molecules. Our approach constructs regression trees utilizing the CSI data of unaligned sequences. The resulting model partitions the binding spectrum into homogeneous regions of position specific nucleotide effects. Each homogeneous partition is then summarized by a position weight matrix (PWM). Hence, the final outcome is a binding intensity rank-ordered collection of PWMs each of which spans a different region in the binding spectrum. Nodes of the regression tree depict the critical position/nucleotide combinations. We analyze the CSI data of the eukaryotic TF Nkx-2.5 and two engineered small molecule DNA ligands and obtain unique insights into their binding properties. The CSI tree for Nkx-2.5 reveals an interaction between two positions of the binding profile and elucidates how different nucleotide combinations at these two positions lead to different binding affinities. The CSI trees for the engineered DNA ligands exhibit a common preference for the dinucleotide AA in the first two positions, which is consistent with preference for a narrow and relatively flat minor groove. We carry out a reanalysis of these data with a mixture of PWMs approach. This approach is an advancement over the simple PWM model and accommodates position dependencies based on only sequence data. Our analysis indicates that the dependencies revealed by the CSI-Tree are challenging to discover without the actual binding intensities. Moreover, such a mixture model is highly sensitive to the number and length of the sequences analyzed. In contrast, CSI-Tree provides interpretable and concise summaries of the complete recognition profiles of DNA-binding molecules by utilizing binding affinities. Oxford University Press 2008-06 2008-04-13 /pmc/articles/PMC2425502/ /pubmed/18411210 http://dx.doi.org/10.1093/nar/gkn057 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Computational Biology Keleş, Sündüz Warren, Christopher L. Carlson, Clayton D. Ansari, Aseem Z. CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title	CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_full	CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_fullStr	CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_full_unstemmed	CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_short	CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_sort	csi-tree: a regression tree approach for modeling binding properties of dna-binding molecules based on cognate site identification (csi) data
topic	Computational Biology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2425502/ https://www.ncbi.nlm.nih.gov/pubmed/18411210 http://dx.doi.org/10.1093/nar/gkn057
work_keys_str_mv	AT kelessunduz csitreearegressiontreeapproachformodelingbindingpropertiesofdnabindingmoleculesbasedoncognatesiteidentificationcsidata AT warrenchristopherl csitreearegressiontreeapproachformodelingbindingpropertiesofdnabindingmoleculesbasedoncognatesiteidentificationcsidata AT carlsonclaytond csitreearegressiontreeapproachformodelingbindingpropertiesofdnabindingmoleculesbasedoncognatesiteidentificationcsidata AT ansariaseemz csitreearegressiontreeapproachformodelingbindingpropertiesofdnabindingmoleculesbasedoncognatesiteidentificationcsidata

CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data

Ejemplares similares