Cargando…

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs

MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov model...

Descripción completa

Detalles Bibliográficos
Autores principales:	Toivonen, Jarkko, Das, Pratyush K, Taipale, Jussi, Ukkonen, Esko
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203737/ https://www.ncbi.nlm.nih.gov/pubmed/31999322 http://dx.doi.org/10.1093/bioinformatics/btaa045

_version_	1783529924473126912
author	Toivonen, Jarkko Das, Pratyush K Taipale, Jussi Ukkonen, Esko
author_facet	Toivonen, Jarkko Das, Pratyush K Taipale, Jussi Ukkonen, Esko
author_sort	Toivonen, Jarkko
collection	PubMed
description	MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION: Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-7203737
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-72037372020-05-11 MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs Toivonen, Jarkko Das, Pratyush K Taipale, Jussi Ukkonen, Esko Bioinformatics Original Papers MOTIVATION: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. RESULTS: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. AVAILABILITY AND IMPLEMENTATION: Software implementation is available from https://github.com/jttoivon/moder2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-05-01 2020-01-30 /pmc/articles/PMC7203737/ /pubmed/31999322 http://dx.doi.org/10.1093/bioinformatics/btaa045 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Toivonen, Jarkko Das, Pratyush K Taipale, Jussi Ukkonen, Esko MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title	MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_full	MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_fullStr	MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_full_unstemmed	MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_short	MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs
title_sort	moder2: first-order markov modeling and discovery of monomeric and dimeric binding motifs
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7203737/ https://www.ncbi.nlm.nih.gov/pubmed/31999322 http://dx.doi.org/10.1093/bioinformatics/btaa045
work_keys_str_mv	AT toivonenjarkko moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs AT daspratyushk moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs AT taipalejussi moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs AT ukkonenesko moder2firstordermarkovmodelinganddiscoveryofmonomericanddimericbindingmotifs

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs

Ejemplares similares