Cargando…
Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction
Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable stat...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3820676/ https://www.ncbi.nlm.nih.gov/pubmed/24278755 http://dx.doi.org/10.6064/2012/917540 |
_version_ | 1782290180529455104 |
---|---|
author | Xia, Xuhua |
author_facet | Xia, Xuhua |
author_sort | Xia, Xuhua |
collection | PubMed |
description | Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site-specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site-specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution. |
format | Online Article Text |
id | pubmed-3820676 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-38206762013-11-25 Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction Xia, Xuhua Scientifica (Cairo) Review Article Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site-specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site-specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution. Hindawi Publishing Corporation 2012 2012-10-23 /pmc/articles/PMC3820676/ /pubmed/24278755 http://dx.doi.org/10.6064/2012/917540 Text en Copyright © 2012 Xuhua Xia. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Review Article Xia, Xuhua Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction |
title | Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction |
title_full | Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction |
title_fullStr | Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction |
title_full_unstemmed | Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction |
title_short | Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction |
title_sort | position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction |
topic | Review Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3820676/ https://www.ncbi.nlm.nih.gov/pubmed/24278755 http://dx.doi.org/10.6064/2012/917540 |
work_keys_str_mv | AT xiaxuhua positionweightmatrixgibbssamplerandtheassociatedsignificancetestsinmotifcharacterizationandprediction |