Cargando…

Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction

Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable stat...

Descripción completa

Detalles Bibliográficos
Autor principal: Xia, Xuhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3820676/
https://www.ncbi.nlm.nih.gov/pubmed/24278755
http://dx.doi.org/10.6064/2012/917540
_version_ 1782290180529455104
author Xia, Xuhua
author_facet Xia, Xuhua
author_sort Xia, Xuhua
collection PubMed
description Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site-specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site-specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution.
format Online
Article
Text
id pubmed-3820676
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-38206762013-11-25 Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction Xia, Xuhua Scientifica (Cairo) Review Article Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site-specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site-specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution. Hindawi Publishing Corporation 2012 2012-10-23 /pmc/articles/PMC3820676/ /pubmed/24278755 http://dx.doi.org/10.6064/2012/917540 Text en Copyright © 2012 Xuhua Xia. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review Article
Xia, Xuhua
Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction
title Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction
title_full Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction
title_fullStr Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction
title_full_unstemmed Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction
title_short Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction
title_sort position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction
topic Review Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3820676/
https://www.ncbi.nlm.nih.gov/pubmed/24278755
http://dx.doi.org/10.6064/2012/917540
work_keys_str_mv AT xiaxuhua positionweightmatrixgibbssamplerandtheassociatedsignificancetestsinmotifcharacterizationandprediction