Cargando…

CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data

The identification and characterization of binding sites of DNA-binding molecules, including transcription factors (TFs), is a critical problem at the interface of chemistry, biology and molecular medicine. The Cognate Site Identification (CSI) array is a high-throughput microarray platform for meas...

Descripción completa

Detalles Bibliográficos
Autores principales: Keleş, Sündüz, Warren, Christopher L., Carlson, Clayton D., Ansari, Aseem Z.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2425502/
https://www.ncbi.nlm.nih.gov/pubmed/18411210
http://dx.doi.org/10.1093/nar/gkn057
_version_ 1782156271330263040
author Keleş, Sündüz
Warren, Christopher L.
Carlson, Clayton D.
Ansari, Aseem Z.
author_facet Keleş, Sündüz
Warren, Christopher L.
Carlson, Clayton D.
Ansari, Aseem Z.
author_sort Keleş, Sündüz
collection PubMed
description The identification and characterization of binding sites of DNA-binding molecules, including transcription factors (TFs), is a critical problem at the interface of chemistry, biology and molecular medicine. The Cognate Site Identification (CSI) array is a high-throughput microarray platform for measuring comprehensive recognition profiles of DNA-binding molecules. This technique produces datasets that are useful not only for identifying binding sites of previously uncharacterized TFs but also for elucidating dependencies, both local and nonlocal, between the nucleotides at different positions of the recognition sites. We have developed a regression tree technique, CSI-Tree, for exploring the spectrum of binding sites of DNA-binding molecules. Our approach constructs regression trees utilizing the CSI data of unaligned sequences. The resulting model partitions the binding spectrum into homogeneous regions of position specific nucleotide effects. Each homogeneous partition is then summarized by a position weight matrix (PWM). Hence, the final outcome is a binding intensity rank-ordered collection of PWMs each of which spans a different region in the binding spectrum. Nodes of the regression tree depict the critical position/nucleotide combinations. We analyze the CSI data of the eukaryotic TF Nkx-2.5 and two engineered small molecule DNA ligands and obtain unique insights into their binding properties. The CSI tree for Nkx-2.5 reveals an interaction between two positions of the binding profile and elucidates how different nucleotide combinations at these two positions lead to different binding affinities. The CSI trees for the engineered DNA ligands exhibit a common preference for the dinucleotide AA in the first two positions, which is consistent with preference for a narrow and relatively flat minor groove. We carry out a reanalysis of these data with a mixture of PWMs approach. This approach is an advancement over the simple PWM model and accommodates position dependencies based on only sequence data. Our analysis indicates that the dependencies revealed by the CSI-Tree are challenging to discover without the actual binding intensities. Moreover, such a mixture model is highly sensitive to the number and length of the sequences analyzed. In contrast, CSI-Tree provides interpretable and concise summaries of the complete recognition profiles of DNA-binding molecules by utilizing binding affinities.
format Text
id pubmed-2425502
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-24255022008-06-12 CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data Keleş, Sündüz Warren, Christopher L. Carlson, Clayton D. Ansari, Aseem Z. Nucleic Acids Res Computational Biology The identification and characterization of binding sites of DNA-binding molecules, including transcription factors (TFs), is a critical problem at the interface of chemistry, biology and molecular medicine. The Cognate Site Identification (CSI) array is a high-throughput microarray platform for measuring comprehensive recognition profiles of DNA-binding molecules. This technique produces datasets that are useful not only for identifying binding sites of previously uncharacterized TFs but also for elucidating dependencies, both local and nonlocal, between the nucleotides at different positions of the recognition sites. We have developed a regression tree technique, CSI-Tree, for exploring the spectrum of binding sites of DNA-binding molecules. Our approach constructs regression trees utilizing the CSI data of unaligned sequences. The resulting model partitions the binding spectrum into homogeneous regions of position specific nucleotide effects. Each homogeneous partition is then summarized by a position weight matrix (PWM). Hence, the final outcome is a binding intensity rank-ordered collection of PWMs each of which spans a different region in the binding spectrum. Nodes of the regression tree depict the critical position/nucleotide combinations. We analyze the CSI data of the eukaryotic TF Nkx-2.5 and two engineered small molecule DNA ligands and obtain unique insights into their binding properties. The CSI tree for Nkx-2.5 reveals an interaction between two positions of the binding profile and elucidates how different nucleotide combinations at these two positions lead to different binding affinities. The CSI trees for the engineered DNA ligands exhibit a common preference for the dinucleotide AA in the first two positions, which is consistent with preference for a narrow and relatively flat minor groove. We carry out a reanalysis of these data with a mixture of PWMs approach. This approach is an advancement over the simple PWM model and accommodates position dependencies based on only sequence data. Our analysis indicates that the dependencies revealed by the CSI-Tree are challenging to discover without the actual binding intensities. Moreover, such a mixture model is highly sensitive to the number and length of the sequences analyzed. In contrast, CSI-Tree provides interpretable and concise summaries of the complete recognition profiles of DNA-binding molecules by utilizing binding affinities. Oxford University Press 2008-06 2008-04-13 /pmc/articles/PMC2425502/ /pubmed/18411210 http://dx.doi.org/10.1093/nar/gkn057 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Keleş, Sündüz
Warren, Christopher L.
Carlson, Clayton D.
Ansari, Aseem Z.
CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_full CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_fullStr CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_full_unstemmed CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_short CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
title_sort csi-tree: a regression tree approach for modeling binding properties of dna-binding molecules based on cognate site identification (csi) data
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2425502/
https://www.ncbi.nlm.nih.gov/pubmed/18411210
http://dx.doi.org/10.1093/nar/gkn057
work_keys_str_mv AT kelessunduz csitreearegressiontreeapproachformodelingbindingpropertiesofdnabindingmoleculesbasedoncognatesiteidentificationcsidata
AT warrenchristopherl csitreearegressiontreeapproachformodelingbindingpropertiesofdnabindingmoleculesbasedoncognatesiteidentificationcsidata
AT carlsonclaytond csitreearegressiontreeapproachformodelingbindingpropertiesofdnabindingmoleculesbasedoncognatesiteidentificationcsidata
AT ansariaseemz csitreearegressiontreeapproachformodelingbindingpropertiesofdnabindingmoleculesbasedoncognatesiteidentificationcsidata