Cargando…

A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease

Transcription factors are key mediators of human complex disease processes. Identifying the target genes of transcription factors will increase our understanding of the biological network leading to disease risk. The prediction of transcription factor binding sites (TFBSs) is one method to identify...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Tianyuan, Furey, Terrence S, Connelly, Jessica J, Ji, Shihao, Nelson, Sarah, Heber, Steffen, Gregory, Simon G, Hauser, Elizabeth R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2742312/
https://www.ncbi.nlm.nih.gov/pubmed/19403457
http://dx.doi.org/10.1186/1479-7364-3-3-221
_version_ 1782171817721462784
author Wang, Tianyuan
Furey, Terrence S
Connelly, Jessica J
Ji, Shihao
Nelson, Sarah
Heber, Steffen
Gregory, Simon G
Hauser, Elizabeth R
author_facet Wang, Tianyuan
Furey, Terrence S
Connelly, Jessica J
Ji, Shihao
Nelson, Sarah
Heber, Steffen
Gregory, Simon G
Hauser, Elizabeth R
author_sort Wang, Tianyuan
collection PubMed
description Transcription factors are key mediators of human complex disease processes. Identifying the target genes of transcription factors will increase our understanding of the biological network leading to disease risk. The prediction of transcription factor binding sites (TFBSs) is one method to identify these target genes; however, current prediction methods need improvement. We chose the transcription factor upstream stimulatory factor l (USF1) to evaluate the performance of our novel TFBS prediction method because of its known genetic association with coronary artery disease (CAD) and the recent availability of USF1 chromatin immunoprecipitation microarray (ChIP-chip) results. The specific goals of our study were to develop a novel and accurate genome-scale method for predicting USF1 binding sites and associated target genes to aid in the study of CAD. Previously published USF1 ChIP-chip data for 1 per cent of the genome were used to develop and evaluate several kernel logistic regression prediction models. A combination of genomic features (phylogenetic conservation, regulatory potential, presence of a CpG island and DNaseI hypersensitivity), as well as position weight matrix (PWM) scores, were used as variables for these models. Our most accurate predictor achieved an area under the receiver operator characteristic curve of 0.827 during cross-validation experiments, significantly outperforming standard PWM-based prediction methods. When applied to the whole human genome, we predicted 24,010 USF1 binding sites within 5 kilobases upstream of the transcription start site of 9,721 genes. These predictions included 16 of 20 genes with strong evidence of USF1 regulation. Finally, in the spirit of genomic convergence, we integrated independent experimental CAD data with these USF1 binding site prediction results to develop a prioritised set of candidate genes for future CAD studies. We have shown that our novel prediction method, which employs genomic features related to the presence of regulatory elements, enables more accurate and efficient prediction of USF1 binding sites. This method can be extended to other transcription factors identified in human disease studies to help further our understanding of the biology of complex disease.
format Text
id pubmed-2742312
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27423122010-04-01 A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease Wang, Tianyuan Furey, Terrence S Connelly, Jessica J Ji, Shihao Nelson, Sarah Heber, Steffen Gregory, Simon G Hauser, Elizabeth R Hum Genomics Research Transcription factors are key mediators of human complex disease processes. Identifying the target genes of transcription factors will increase our understanding of the biological network leading to disease risk. The prediction of transcription factor binding sites (TFBSs) is one method to identify these target genes; however, current prediction methods need improvement. We chose the transcription factor upstream stimulatory factor l (USF1) to evaluate the performance of our novel TFBS prediction method because of its known genetic association with coronary artery disease (CAD) and the recent availability of USF1 chromatin immunoprecipitation microarray (ChIP-chip) results. The specific goals of our study were to develop a novel and accurate genome-scale method for predicting USF1 binding sites and associated target genes to aid in the study of CAD. Previously published USF1 ChIP-chip data for 1 per cent of the genome were used to develop and evaluate several kernel logistic regression prediction models. A combination of genomic features (phylogenetic conservation, regulatory potential, presence of a CpG island and DNaseI hypersensitivity), as well as position weight matrix (PWM) scores, were used as variables for these models. Our most accurate predictor achieved an area under the receiver operator characteristic curve of 0.827 during cross-validation experiments, significantly outperforming standard PWM-based prediction methods. When applied to the whole human genome, we predicted 24,010 USF1 binding sites within 5 kilobases upstream of the transcription start site of 9,721 genes. These predictions included 16 of 20 genes with strong evidence of USF1 regulation. Finally, in the spirit of genomic convergence, we integrated independent experimental CAD data with these USF1 binding site prediction results to develop a prioritised set of candidate genes for future CAD studies. We have shown that our novel prediction method, which employs genomic features related to the presence of regulatory elements, enables more accurate and efficient prediction of USF1 binding sites. This method can be extended to other transcription factors identified in human disease studies to help further our understanding of the biology of complex disease. BioMed Central 2009-04-01 /pmc/articles/PMC2742312/ /pubmed/19403457 http://dx.doi.org/10.1186/1479-7364-3-3-221 Text en Copyright ©2009 Henry Stewart Publications
spellingShingle Research
Wang, Tianyuan
Furey, Terrence S
Connelly, Jessica J
Ji, Shihao
Nelson, Sarah
Heber, Steffen
Gregory, Simon G
Hauser, Elizabeth R
A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease
title A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease
title_full A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease
title_fullStr A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease
title_full_unstemmed A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease
title_short A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease
title_sort general integrative genomic feature transcription factor binding site prediction method applied to analysis of usf1 binding in cardiovascular disease
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2742312/
https://www.ncbi.nlm.nih.gov/pubmed/19403457
http://dx.doi.org/10.1186/1479-7364-3-3-221
work_keys_str_mv AT wangtianyuan ageneralintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT fureyterrences ageneralintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT connellyjessicaj ageneralintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT jishihao ageneralintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT nelsonsarah ageneralintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT hebersteffen ageneralintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT gregorysimong ageneralintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT hauserelizabethr ageneralintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT wangtianyuan generalintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT fureyterrences generalintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT connellyjessicaj generalintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT jishihao generalintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT nelsonsarah generalintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT hebersteffen generalintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT gregorysimong generalintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease
AT hauserelizabethr generalintegrativegenomicfeaturetranscriptionfactorbindingsitepredictionmethodappliedtoanalysisofusf1bindingincardiovasculardisease