Cargando…

Metamotifs - a generative model for building families of nucleotide position weight matrices

BACKGROUND: Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif d...

Descripción completa

Detalles Bibliográficos
Autores principales: Piipari, Matias, Down, Thomas A, Hubbard, Tim JP
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906491/
https://www.ncbi.nlm.nih.gov/pubmed/20579334
http://dx.doi.org/10.1186/1471-2105-11-348
_version_ 1782184037230575616
author Piipari, Matias
Down, Thomas A
Hubbard, Tim JP
author_facet Piipari, Matias
Down, Thomas A
Hubbard, Tim JP
author_sort Piipari, Matias
collection PubMed
description BACKGROUND: Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence. RESULTS: We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain. CONCLUSIONS: We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.
format Text
id pubmed-2906491
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29064912010-07-20 Metamotifs - a generative model for building families of nucleotide position weight matrices Piipari, Matias Down, Thomas A Hubbard, Tim JP BMC Bioinformatics Methodology Article BACKGROUND: Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence. RESULTS: We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain. CONCLUSIONS: We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite. BioMed Central 2010-06-25 /pmc/articles/PMC2906491/ /pubmed/20579334 http://dx.doi.org/10.1186/1471-2105-11-348 Text en Copyright ©2010 Piipari et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Piipari, Matias
Down, Thomas A
Hubbard, Tim JP
Metamotifs - a generative model for building families of nucleotide position weight matrices
title Metamotifs - a generative model for building families of nucleotide position weight matrices
title_full Metamotifs - a generative model for building families of nucleotide position weight matrices
title_fullStr Metamotifs - a generative model for building families of nucleotide position weight matrices
title_full_unstemmed Metamotifs - a generative model for building families of nucleotide position weight matrices
title_short Metamotifs - a generative model for building families of nucleotide position weight matrices
title_sort metamotifs - a generative model for building families of nucleotide position weight matrices
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906491/
https://www.ncbi.nlm.nih.gov/pubmed/20579334
http://dx.doi.org/10.1186/1471-2105-11-348
work_keys_str_mv AT piiparimatias metamotifsagenerativemodelforbuildingfamiliesofnucleotidepositionweightmatrices
AT downthomasa metamotifsagenerativemodelforbuildingfamiliesofnucleotidepositionweightmatrices
AT hubbardtimjp metamotifsagenerativemodelforbuildingfamiliesofnucleotidepositionweightmatrices