Cargando…
Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency
BACKGROUND: Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, ~1 KB, regions are found far from coding sequences and cannot be ex...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2003
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC341902/ https://www.ncbi.nlm.nih.gov/pubmed/14690551 http://dx.doi.org/10.1186/1471-2105-4-65 |
_version_ | 1782121229739622400 |
---|---|
author | Nazina, Anna G Papatsenko, Dmitri A |
author_facet | Nazina, Anna G Papatsenko, Dmitri A |
author_sort | Nazina, Anna G |
collection | PubMed |
description | BACKGROUND: Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, ~1 KB, regions are found far from coding sequences and cannot be extracted from genome on the basis of their relative position to the coding regions. RESULTS: To explore the feasibility of CRM extraction from a genome, we generated an original training set, containing annotated sequence data for most of the known developmental CRMs from Drosophila. Based on this set of experimental data, we developed a strategy for statistical extraction of cis-regulatory modules from the genome, using exhaustive analysis of local word frequency (LWF). To assess the performance of our analysis, we measured the correlation between predictions generated by the LWF algorithm and the distribution of conserved non-coding regions in a number of Drosophila developmental genes. CONCLUSIONS: In most of the cases tested, we observed high correlation (up to 0.6–0.8, measured on the entire gene locus) between the two independent techniques. We discuss computational strategies available for extraction of Drosophila CRMs and possible extensions of these methods. |
format | Text |
id | pubmed-341902 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2003 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-3419022004-02-17 Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency Nazina, Anna G Papatsenko, Dmitri A BMC Bioinformatics Research Article BACKGROUND: Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, ~1 KB, regions are found far from coding sequences and cannot be extracted from genome on the basis of their relative position to the coding regions. RESULTS: To explore the feasibility of CRM extraction from a genome, we generated an original training set, containing annotated sequence data for most of the known developmental CRMs from Drosophila. Based on this set of experimental data, we developed a strategy for statistical extraction of cis-regulatory modules from the genome, using exhaustive analysis of local word frequency (LWF). To assess the performance of our analysis, we measured the correlation between predictions generated by the LWF algorithm and the distribution of conserved non-coding regions in a number of Drosophila developmental genes. CONCLUSIONS: In most of the cases tested, we observed high correlation (up to 0.6–0.8, measured on the entire gene locus) between the two independent techniques. We discuss computational strategies available for extraction of Drosophila CRMs and possible extensions of these methods. BioMed Central 2003-12-22 /pmc/articles/PMC341902/ /pubmed/14690551 http://dx.doi.org/10.1186/1471-2105-4-65 Text en Copyright © 2003 Nazina and Papatsenko; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. |
spellingShingle | Research Article Nazina, Anna G Papatsenko, Dmitri A Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency |
title | Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency |
title_full | Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency |
title_fullStr | Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency |
title_full_unstemmed | Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency |
title_short | Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency |
title_sort | statistical extraction of drosophila cis-regulatory modules using exhaustive assessment of local word frequency |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC341902/ https://www.ncbi.nlm.nih.gov/pubmed/14690551 http://dx.doi.org/10.1186/1471-2105-4-65 |
work_keys_str_mv | AT nazinaannag statisticalextractionofdrosophilacisregulatorymodulesusingexhaustiveassessmentoflocalwordfrequency AT papatsenkodmitria statisticalextractionofdrosophilacisregulatorymodulesusingexhaustiveassessmentoflocalwordfrequency |