Cargando…
MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov model...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203737/ https://www.ncbi.nlm.nih.gov/pubmed/31999322 http://dx.doi.org/10.1093/bioinformatics/btaa045 |
_version_ | 1783529924473126912 |
---|---|
author | Toivonen, Jarkko Das, Pratyush K Taipale, Jussi Ukkonen, Esko |
author_facet | Toivonen, Jarkko Das, Pratyush K Taipale, Jussi Ukkonen, Esko |
author_sort | Toivonen, Jarkko |
collection | PubMed |
description | MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION: Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7203737 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72037372020-05-11 MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs Toivonen, Jarkko Das, Pratyush K Taipale, Jussi Ukkonen, Esko Bioinformatics Original Papers MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION: Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-01 2020-01-30 /pmc/articles/PMC7203737/ /pubmed/31999322 http://dx.doi.org/10.1093/bioinformatics/btaa045 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Toivonen, Jarkko Das, Pratyush K Taipale, Jussi Ukkonen, Esko MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs |
title | MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs |
title_full | MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs |
title_fullStr | MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs |
title_full_unstemmed | MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs |
title_short | MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs |
title_sort | moder2: first-order markov modeling and discovery of monomeric and dimeric binding motifs |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203737/ https://www.ncbi.nlm.nih.gov/pubmed/31999322 http://dx.doi.org/10.1093/bioinformatics/btaa045 |
work_keys_str_mv | AT toivonenjarkko moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs AT daspratyushk moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs AT taipalejussi moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs AT ukkonenesko moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs |