Cargando…

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs

MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov model...

Descripción completa

Detalles Bibliográficos
Autores principales: Toivonen, Jarkko, Das, Pratyush K, Taipale, Jussi, Ukkonen, Esko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203737/
https://www.ncbi.nlm.nih.gov/pubmed/31999322
http://dx.doi.org/10.1093/bioinformatics/btaa045
_version_ 1783529924473126912
author Toivonen, Jarkko
Das, Pratyush K
Taipale, Jussi
Ukkonen, Esko
author_facet Toivonen, Jarkko
Das, Pratyush K
Taipale, Jussi
Ukkonen, Esko
author_sort Toivonen, Jarkko
collection PubMed
description MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION: Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7203737
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72037372020-05-11 MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs Toivonen, Jarkko Das, Pratyush K Taipale, Jussi Ukkonen, Esko Bioinformatics Original Papers MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION: Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-01 2020-01-30 /pmc/articles/PMC7203737/ /pubmed/31999322 http://dx.doi.org/10.1093/bioinformatics/btaa045 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Toivonen, Jarkko
Das, Pratyush K
Taipale, Jussi
Ukkonen, Esko
MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_full MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_fullStr MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_full_unstemmed MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_short MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_sort moder2: first-order markov modeling and discovery of monomeric and dimeric binding motifs
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203737/
https://www.ncbi.nlm.nih.gov/pubmed/31999322
http://dx.doi.org/10.1093/bioinformatics/btaa045
work_keys_str_mv AT toivonenjarkko moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs
AT daspratyushk moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs
AT taipalejussi moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs
AT ukkonenesko moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs