Cargando…

Tandem machine learning for the identification of genes regulated by transcription factors

BACKGROUND: The identification of promoter regions that are regulated by a given transcription factor has traditionally relied upon the identification and distributions of binding sites recognized by the factor. In this study, we have developed a tandem machine learning approach for the identificati...

Descripción completa

Detalles Bibliográficos
Autores principales: Dinakarpandian, Deendayal, Raheja, Venetia, Mehta, Saumil, Schuetz, Erin G, Rogan, Peter K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1208855/
https://www.ncbi.nlm.nih.gov/pubmed/16115317
http://dx.doi.org/10.1186/1471-2105-6-204
_version_ 1782124916580024320
author Dinakarpandian, Deendayal
Raheja, Venetia
Mehta, Saumil
Schuetz, Erin G
Rogan, Peter K
author_facet Dinakarpandian, Deendayal
Raheja, Venetia
Mehta, Saumil
Schuetz, Erin G
Rogan, Peter K
author_sort Dinakarpandian, Deendayal
collection PubMed
description BACKGROUND: The identification of promoter regions that are regulated by a given transcription factor has traditionally relied upon the identification and distributions of binding sites recognized by the factor. In this study, we have developed a tandem machine learning approach for the identification of regulatory target genes based on these parameters and on the corresponding binding site information contents that measure the affinities of the factor for these cognate elements. RESULTS: This method has been validated using models of DNA binding sites recognized by the xenobiotic-sensitive nuclear receptor, PXR/RXRα, for target genes within the human genome. An information theory-based weight matrix was first derived and refined from known PXR/RXRα binding sites. The promoter region of candidate genes was scanned with the weight matrix. A novel information density-based clustering algorithm was then used to identify clusters of information rich sites. Finally, transformed data representing metrics of location, strength and clustering of binding sites were used for classification of promoter regions using an ensemble approach involving neural networks, decision trees and Naïve Bayesian classification. The method was evaluated on a set of 24 known target genes and 288 genes known not to be regulated by PXR/RXRα. We report an average accuracy (proportion of correctly classified promoter regions) of 71%, sensitivity of 73%, and specificity of 70%, based on multiple cross-validation and the leave-one-out strategy. The performance on a test set of 13 genes showed that 10 were correctly classified. CONCLUSION: We have developed a machine learning approach for the successful detection of gene targets for transcription factors with high accuracy. The method has been validated for the transcription factor PXR/RXRα and has the potential to be extended to other transcription factors.
format Text
id pubmed-1208855
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-12088552005-09-15 Tandem machine learning for the identification of genes regulated by transcription factors Dinakarpandian, Deendayal Raheja, Venetia Mehta, Saumil Schuetz, Erin G Rogan, Peter K BMC Bioinformatics Research Article BACKGROUND: The identification of promoter regions that are regulated by a given transcription factor has traditionally relied upon the identification and distributions of binding sites recognized by the factor. In this study, we have developed a tandem machine learning approach for the identification of regulatory target genes based on these parameters and on the corresponding binding site information contents that measure the affinities of the factor for these cognate elements. RESULTS: This method has been validated using models of DNA binding sites recognized by the xenobiotic-sensitive nuclear receptor, PXR/RXRα, for target genes within the human genome. An information theory-based weight matrix was first derived and refined from known PXR/RXRα binding sites. The promoter region of candidate genes was scanned with the weight matrix. A novel information density-based clustering algorithm was then used to identify clusters of information rich sites. Finally, transformed data representing metrics of location, strength and clustering of binding sites were used for classification of promoter regions using an ensemble approach involving neural networks, decision trees and Naïve Bayesian classification. The method was evaluated on a set of 24 known target genes and 288 genes known not to be regulated by PXR/RXRα. We report an average accuracy (proportion of correctly classified promoter regions) of 71%, sensitivity of 73%, and specificity of 70%, based on multiple cross-validation and the leave-one-out strategy. The performance on a test set of 13 genes showed that 10 were correctly classified. CONCLUSION: We have developed a machine learning approach for the successful detection of gene targets for transcription factors with high accuracy. The method has been validated for the transcription factor PXR/RXRα and has the potential to be extended to other transcription factors. BioMed Central 2005-08-22 /pmc/articles/PMC1208855/ /pubmed/16115317 http://dx.doi.org/10.1186/1471-2105-6-204 Text en Copyright © 2005 Dinakarpandian et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Dinakarpandian, Deendayal
Raheja, Venetia
Mehta, Saumil
Schuetz, Erin G
Rogan, Peter K
Tandem machine learning for the identification of genes regulated by transcription factors
title Tandem machine learning for the identification of genes regulated by transcription factors
title_full Tandem machine learning for the identification of genes regulated by transcription factors
title_fullStr Tandem machine learning for the identification of genes regulated by transcription factors
title_full_unstemmed Tandem machine learning for the identification of genes regulated by transcription factors
title_short Tandem machine learning for the identification of genes regulated by transcription factors
title_sort tandem machine learning for the identification of genes regulated by transcription factors
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1208855/
https://www.ncbi.nlm.nih.gov/pubmed/16115317
http://dx.doi.org/10.1186/1471-2105-6-204
work_keys_str_mv AT dinakarpandiandeendayal tandemmachinelearningfortheidentificationofgenesregulatedbytranscriptionfactors
AT rahejavenetia tandemmachinelearningfortheidentificationofgenesregulatedbytranscriptionfactors
AT mehtasaumil tandemmachinelearningfortheidentificationofgenesregulatedbytranscriptionfactors
AT schuetzering tandemmachinelearningfortheidentificationofgenesregulatedbytranscriptionfactors
AT roganpeterk tandemmachinelearningfortheidentificationofgenesregulatedbytranscriptionfactors