Cargando…

Determining significance of pairwise co-occurrences of events in bursty sequences

BACKGROUND: Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain transcription factors (TF, types) in a DNA sequence. These events tend to occur in bursts: in some genomic regions...

Descripción completa

Detalles Bibliográficos
Autores principales: Haiminen, Niina, Mannila, Heikki, Terzi, Evimaria
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2547115/
https://www.ncbi.nlm.nih.gov/pubmed/18691400
http://dx.doi.org/10.1186/1471-2105-9-336
_version_ 1782159213590478848
author Haiminen, Niina
Mannila, Heikki
Terzi, Evimaria
author_facet Haiminen, Niina
Mannila, Heikki
Terzi, Evimaria
author_sort Haiminen, Niina
collection PubMed
description BACKGROUND: Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain transcription factors (TF, types) in a DNA sequence. These events tend to occur in bursts: in some genomic regions there are more genes and therefore potentially more binding sites, while in some, possibly very long regions, hardly any events occur. Also some types of events may occur in the sequence more often than others. Tendencies of co-occurrence of binding sites of two or more TFs are interesting, as they may imply a co-operative role between the TFs in regulatory processes. Determining a numerical value to summarize the tendency for co-occurrence between two TFs can be done in a number of ways. However, testing for the significance of such values should be done with respect to a relevant null model that takes into account the global sequence structure. RESULTS: We extend the existing techniques that have been considered for determining the significance of co-occurrence patterns between a pair of event types under different null models. These models range from very simple ones to more complex models that take the burstiness of sequences into account. We evaluate the models and techniques on synthetic event sequences, and on real data consisting of potential transcription factor binding sites. CONCLUSION: We show that simple null models are poorly suited for bursty data, and they yield many false positives. More sophisticated models give better results in our experiments. We also demonstrate the effect of the window size, i.e., maximum co-occurrence distance, on the significance results.
format Text
id pubmed-2547115
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25471152008-09-23 Determining significance of pairwise co-occurrences of events in bursty sequences Haiminen, Niina Mannila, Heikki Terzi, Evimaria BMC Bioinformatics Research Article BACKGROUND: Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain transcription factors (TF, types) in a DNA sequence. These events tend to occur in bursts: in some genomic regions there are more genes and therefore potentially more binding sites, while in some, possibly very long regions, hardly any events occur. Also some types of events may occur in the sequence more often than others. Tendencies of co-occurrence of binding sites of two or more TFs are interesting, as they may imply a co-operative role between the TFs in regulatory processes. Determining a numerical value to summarize the tendency for co-occurrence between two TFs can be done in a number of ways. However, testing for the significance of such values should be done with respect to a relevant null model that takes into account the global sequence structure. RESULTS: We extend the existing techniques that have been considered for determining the significance of co-occurrence patterns between a pair of event types under different null models. These models range from very simple ones to more complex models that take the burstiness of sequences into account. We evaluate the models and techniques on synthetic event sequences, and on real data consisting of potential transcription factor binding sites. CONCLUSION: We show that simple null models are poorly suited for bursty data, and they yield many false positives. More sophisticated models give better results in our experiments. We also demonstrate the effect of the window size, i.e., maximum co-occurrence distance, on the significance results. BioMed Central 2008-08-08 /pmc/articles/PMC2547115/ /pubmed/18691400 http://dx.doi.org/10.1186/1471-2105-9-336 Text en Copyright © 2008 Haiminen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Haiminen, Niina
Mannila, Heikki
Terzi, Evimaria
Determining significance of pairwise co-occurrences of events in bursty sequences
title Determining significance of pairwise co-occurrences of events in bursty sequences
title_full Determining significance of pairwise co-occurrences of events in bursty sequences
title_fullStr Determining significance of pairwise co-occurrences of events in bursty sequences
title_full_unstemmed Determining significance of pairwise co-occurrences of events in bursty sequences
title_short Determining significance of pairwise co-occurrences of events in bursty sequences
title_sort determining significance of pairwise co-occurrences of events in bursty sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2547115/
https://www.ncbi.nlm.nih.gov/pubmed/18691400
http://dx.doi.org/10.1186/1471-2105-9-336
work_keys_str_mv AT haiminenniina determiningsignificanceofpairwisecooccurrencesofeventsinburstysequences
AT mannilaheikki determiningsignificanceofpairwisecooccurrencesofeventsinburstysequences
AT terzievimaria determiningsignificanceofpairwisecooccurrencesofeventsinburstysequences