Cargando…

An improved compound Poisson model for the number of motif hits in DNA sequences

MOTIVATION: Transcription factors play a crucial role in gene regulation by binding to specific regulatory sequences. The sequence motifs recognized by a transcription factor can be described in terms of position frequency matrices. When scanning a sequence for matches to a position frequency matrix...

Descripción completa

Detalles Bibliográficos
Autores principales: Kopp, Wolfgang, Vingron, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860096/
https://www.ncbi.nlm.nih.gov/pubmed/28961747
http://dx.doi.org/10.1093/bioinformatics/btx539
_version_ 1783307945030713344
author Kopp, Wolfgang
Vingron, Martin
author_facet Kopp, Wolfgang
Vingron, Martin
author_sort Kopp, Wolfgang
collection PubMed
description MOTIVATION: Transcription factors play a crucial role in gene regulation by binding to specific regulatory sequences. The sequence motifs recognized by a transcription factor can be described in terms of position frequency matrices. When scanning a sequence for matches to a position frequency matrix, one needs to determine a cut-off, which then in turn results in a certain number of hits. In this paper we describe how to compute the distribution of match scores and of the number of motif hits, which are the prerequisites to perform motif hit enrichment analysis. RESULTS: We put forward an improved compound Poisson model that supports general order-d Markov background models and which computes the number of motif-hits more accurately than earlier models. We compared the accuracy of the improved compound Poisson model with previously proposed models across a range of parameters and motifs, demonstrating the improvement. The importance of the order-d model is supported in a case study using CpG-island sequences. AVAILABILITY AND IMPLEMENTATION: The method is available as a Bioconductor package named ’motifcounter’ https://bioconductor.org/packages/motifcounter. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5860096
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58600962018-03-23 An improved compound Poisson model for the number of motif hits in DNA sequences Kopp, Wolfgang Vingron, Martin Bioinformatics Original Papers MOTIVATION: Transcription factors play a crucial role in gene regulation by binding to specific regulatory sequences. The sequence motifs recognized by a transcription factor can be described in terms of position frequency matrices. When scanning a sequence for matches to a position frequency matrix, one needs to determine a cut-off, which then in turn results in a certain number of hits. In this paper we describe how to compute the distribution of match scores and of the number of motif hits, which are the prerequisites to perform motif hit enrichment analysis. RESULTS: We put forward an improved compound Poisson model that supports general order-d Markov background models and which computes the number of motif-hits more accurately than earlier models. We compared the accuracy of the improved compound Poisson model with previously proposed models across a range of parameters and motifs, demonstrating the improvement. The importance of the order-d model is supported in a case study using CpG-island sequences. AVAILABILITY AND IMPLEMENTATION: The method is available as a Bioconductor package named ’motifcounter’ https://bioconductor.org/packages/motifcounter. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-12-15 2017-08-28 /pmc/articles/PMC5860096/ /pubmed/28961747 http://dx.doi.org/10.1093/bioinformatics/btx539 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Kopp, Wolfgang
Vingron, Martin
An improved compound Poisson model for the number of motif hits in DNA sequences
title An improved compound Poisson model for the number of motif hits in DNA sequences
title_full An improved compound Poisson model for the number of motif hits in DNA sequences
title_fullStr An improved compound Poisson model for the number of motif hits in DNA sequences
title_full_unstemmed An improved compound Poisson model for the number of motif hits in DNA sequences
title_short An improved compound Poisson model for the number of motif hits in DNA sequences
title_sort improved compound poisson model for the number of motif hits in dna sequences
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860096/
https://www.ncbi.nlm.nih.gov/pubmed/28961747
http://dx.doi.org/10.1093/bioinformatics/btx539
work_keys_str_mv AT koppwolfgang animprovedcompoundpoissonmodelforthenumberofmotifhitsindnasequences
AT vingronmartin animprovedcompoundpoissonmodelforthenumberofmotifhitsindnasequences
AT koppwolfgang improvedcompoundpoissonmodelforthenumberofmotifhitsindnasequences
AT vingronmartin improvedcompoundpoissonmodelforthenumberofmotifhitsindnasequences