Cargando…
Genome-wide prediction, display and refinement of binding sites with information theory-based models
BACKGROUND: We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined) weight matrices. Matrices are refine...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2003
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC200970/ https://www.ncbi.nlm.nih.gov/pubmed/12962546 http://dx.doi.org/10.1186/1471-2105-4-38 |
_version_ | 1782120941745078272 |
---|---|
author | Gadiraju, Sashidhar Vyhlidal, Carrie A Leeder, J Steven Rogan, Peter K |
author_facet | Gadiraju, Sashidhar Vyhlidal, Carrie A Leeder, J Steven Rogan, Peter K |
author_sort | Gadiraju, Sashidhar |
collection | PubMed |
description | BACKGROUND: We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined) weight matrices. Matrices are refined by adding experimentally-defined binding sites to published binding sites. Delila-Genome was used to examine the accuracy of individual information contents of binding sites detected with refined matrices as a measure of the strengths of the corresponding protein-nucleic acid interactions. The software can then be used to predict novel sites by rescanning the genome with the refined matrices. RESULTS: Parameters for genome scans are entered using a Java-based GUI interface and backend scripts in Perl. Multi-processor CPU load-sharing minimized the average response time for scans of different chromosomes. Scans of human genome assemblies required 4–6 hours for transcription factor binding sites and 10–19 hours for splice sites, respectively, on 24- and 3-node Mosix and Beowulf clusters. Individual binding sites are displayed either as high-resolution sequence walkers or in low-resolution custom tracks in the UCSC genome browser. For large datasets, we applied a data reduction strategy that limited displays of binding sites exceeding a threshold information content to specific chromosomal regions within or adjacent to genes. An HTML document is produced listing binding sites ranked by binding site strength or chromosomal location hyperlinked to the UCSC custom track, other annotation databases and binding site sequences. Post-genome scan tools parse binding site annotations of selected chromosome intervals and compare the results of genome scans using different weight matrices. Comparisons of multiple genome scans can display binding sites that are unique to each scan and identify sites with significantly altered binding strengths. CONCLUSIONS: Delila-Genome was used to scan the human genome sequence with information weight matrices of transcription factor binding sites, including PXR/RXRα, AHR and NF-κB p50/p65, and matrices for RNA binding sites including splice donor, acceptor, and SC35 recognition sites. Comparisons of genome scans with the original and refined PXR/RXRα information weight matrices indicate that the refined model more accurately predicts the strengths of known binding sites and is more sensitive for detection of novel binding sites. |
format | Text |
id | pubmed-200970 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2003 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-2009702003-09-30 Genome-wide prediction, display and refinement of binding sites with information theory-based models Gadiraju, Sashidhar Vyhlidal, Carrie A Leeder, J Steven Rogan, Peter K BMC Bioinformatics Software BACKGROUND: We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined) weight matrices. Matrices are refined by adding experimentally-defined binding sites to published binding sites. Delila-Genome was used to examine the accuracy of individual information contents of binding sites detected with refined matrices as a measure of the strengths of the corresponding protein-nucleic acid interactions. The software can then be used to predict novel sites by rescanning the genome with the refined matrices. RESULTS: Parameters for genome scans are entered using a Java-based GUI interface and backend scripts in Perl. Multi-processor CPU load-sharing minimized the average response time for scans of different chromosomes. Scans of human genome assemblies required 4–6 hours for transcription factor binding sites and 10–19 hours for splice sites, respectively, on 24- and 3-node Mosix and Beowulf clusters. Individual binding sites are displayed either as high-resolution sequence walkers or in low-resolution custom tracks in the UCSC genome browser. For large datasets, we applied a data reduction strategy that limited displays of binding sites exceeding a threshold information content to specific chromosomal regions within or adjacent to genes. An HTML document is produced listing binding sites ranked by binding site strength or chromosomal location hyperlinked to the UCSC custom track, other annotation databases and binding site sequences. Post-genome scan tools parse binding site annotations of selected chromosome intervals and compare the results of genome scans using different weight matrices. Comparisons of multiple genome scans can display binding sites that are unique to each scan and identify sites with significantly altered binding strengths. CONCLUSIONS: Delila-Genome was used to scan the human genome sequence with information weight matrices of transcription factor binding sites, including PXR/RXRα, AHR and NF-κB p50/p65, and matrices for RNA binding sites including splice donor, acceptor, and SC35 recognition sites. Comparisons of genome scans with the original and refined PXR/RXRα information weight matrices indicate that the refined model more accurately predicts the strengths of known binding sites and is more sensitive for detection of novel binding sites. BioMed Central 2003-09-08 /pmc/articles/PMC200970/ /pubmed/12962546 http://dx.doi.org/10.1186/1471-2105-4-38 Text en Copyright © 2003 Gadiraju et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. |
spellingShingle | Software Gadiraju, Sashidhar Vyhlidal, Carrie A Leeder, J Steven Rogan, Peter K Genome-wide prediction, display and refinement of binding sites with information theory-based models |
title | Genome-wide prediction, display and refinement of binding sites with information theory-based models |
title_full | Genome-wide prediction, display and refinement of binding sites with information theory-based models |
title_fullStr | Genome-wide prediction, display and refinement of binding sites with information theory-based models |
title_full_unstemmed | Genome-wide prediction, display and refinement of binding sites with information theory-based models |
title_short | Genome-wide prediction, display and refinement of binding sites with information theory-based models |
title_sort | genome-wide prediction, display and refinement of binding sites with information theory-based models |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC200970/ https://www.ncbi.nlm.nih.gov/pubmed/12962546 http://dx.doi.org/10.1186/1471-2105-4-38 |
work_keys_str_mv | AT gadirajusashidhar genomewidepredictiondisplayandrefinementofbindingsiteswithinformationtheorybasedmodels AT vyhlidalcarriea genomewidepredictiondisplayandrefinementofbindingsiteswithinformationtheorybasedmodels AT leederjsteven genomewidepredictiondisplayandrefinementofbindingsiteswithinformationtheorybasedmodels AT roganpeterk genomewidepredictiondisplayandrefinementofbindingsiteswithinformationtheorybasedmodels |