Cargando…

Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency

BACKGROUND: Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, ~1 KB, regions are found far from coding sequences and cannot be ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Nazina, Anna G, Papatsenko, Dmitri A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC341902/
https://www.ncbi.nlm.nih.gov/pubmed/14690551
http://dx.doi.org/10.1186/1471-2105-4-65
_version_ 1782121229739622400
author Nazina, Anna G
Papatsenko, Dmitri A
author_facet Nazina, Anna G
Papatsenko, Dmitri A
author_sort Nazina, Anna G
collection PubMed
description BACKGROUND: Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, ~1 KB, regions are found far from coding sequences and cannot be extracted from genome on the basis of their relative position to the coding regions. RESULTS: To explore the feasibility of CRM extraction from a genome, we generated an original training set, containing annotated sequence data for most of the known developmental CRMs from Drosophila. Based on this set of experimental data, we developed a strategy for statistical extraction of cis-regulatory modules from the genome, using exhaustive analysis of local word frequency (LWF). To assess the performance of our analysis, we measured the correlation between predictions generated by the LWF algorithm and the distribution of conserved non-coding regions in a number of Drosophila developmental genes. CONCLUSIONS: In most of the cases tested, we observed high correlation (up to 0.6–0.8, measured on the entire gene locus) between the two independent techniques. We discuss computational strategies available for extraction of Drosophila CRMs and possible extensions of these methods.
format Text
id pubmed-341902
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-3419022004-02-17 Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency Nazina, Anna G Papatsenko, Dmitri A BMC Bioinformatics Research Article BACKGROUND: Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, ~1 KB, regions are found far from coding sequences and cannot be extracted from genome on the basis of their relative position to the coding regions. RESULTS: To explore the feasibility of CRM extraction from a genome, we generated an original training set, containing annotated sequence data for most of the known developmental CRMs from Drosophila. Based on this set of experimental data, we developed a strategy for statistical extraction of cis-regulatory modules from the genome, using exhaustive analysis of local word frequency (LWF). To assess the performance of our analysis, we measured the correlation between predictions generated by the LWF algorithm and the distribution of conserved non-coding regions in a number of Drosophila developmental genes. CONCLUSIONS: In most of the cases tested, we observed high correlation (up to 0.6–0.8, measured on the entire gene locus) between the two independent techniques. We discuss computational strategies available for extraction of Drosophila CRMs and possible extensions of these methods. BioMed Central 2003-12-22 /pmc/articles/PMC341902/ /pubmed/14690551 http://dx.doi.org/10.1186/1471-2105-4-65 Text en Copyright © 2003 Nazina and Papatsenko; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Nazina, Anna G
Papatsenko, Dmitri A
Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency
title Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency
title_full Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency
title_fullStr Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency
title_full_unstemmed Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency
title_short Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency
title_sort statistical extraction of drosophila cis-regulatory modules using exhaustive assessment of local word frequency
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC341902/
https://www.ncbi.nlm.nih.gov/pubmed/14690551
http://dx.doi.org/10.1186/1471-2105-4-65
work_keys_str_mv AT nazinaannag statisticalextractionofdrosophilacisregulatorymodulesusingexhaustiveassessmentoflocalwordfrequency
AT papatsenkodmitria statisticalextractionofdrosophilacisregulatorymodulesusingexhaustiveassessmentoflocalwordfrequency