Cargando…
PSI-BLAST pseudocounts and the minimum description length principle
Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino a...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2647318/ https://www.ncbi.nlm.nih.gov/pubmed/19088134 http://dx.doi.org/10.1093/nar/gkn981 |
_version_ | 1782164920953995264 |
---|---|
author | Altschul, Stephen F. Gertz, E. Michael Agarwala, Richa Schäffer, Alejandro A. Yu, Yi-Kuo |
author_facet | Altschul, Stephen F. Gertz, E. Michael Agarwala, Richa Schäffer, Alejandro A. Yu, Yi-Kuo |
author_sort | Altschul, Stephen F. |
collection | PubMed |
description | Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default. |
format | Text |
id | pubmed-2647318 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-26473182009-03-04 PSI-BLAST pseudocounts and the minimum description length principle Altschul, Stephen F. Gertz, E. Michael Agarwala, Richa Schäffer, Alejandro A. Yu, Yi-Kuo Nucleic Acids Res Computational Biology Position specific score matrices (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the observed amino acid counts in a multiple alignment column. In the absence of theory, the number of pseudocounts used has been a completely empirical parameter. This article argues that the minimum description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a number of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calculating pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default. Oxford University Press 2009-02 2008-12-16 /pmc/articles/PMC2647318/ /pubmed/19088134 http://dx.doi.org/10.1093/nar/gkn981 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Altschul, Stephen F. Gertz, E. Michael Agarwala, Richa Schäffer, Alejandro A. Yu, Yi-Kuo PSI-BLAST pseudocounts and the minimum description length principle |
title | PSI-BLAST pseudocounts and the minimum description length principle |
title_full | PSI-BLAST pseudocounts and the minimum description length principle |
title_fullStr | PSI-BLAST pseudocounts and the minimum description length principle |
title_full_unstemmed | PSI-BLAST pseudocounts and the minimum description length principle |
title_short | PSI-BLAST pseudocounts and the minimum description length principle |
title_sort | psi-blast pseudocounts and the minimum description length principle |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2647318/ https://www.ncbi.nlm.nih.gov/pubmed/19088134 http://dx.doi.org/10.1093/nar/gkn981 |
work_keys_str_mv | AT altschulstephenf psiblastpseudocountsandtheminimumdescriptionlengthprinciple AT gertzemichael psiblastpseudocountsandtheminimumdescriptionlengthprinciple AT agarwalaricha psiblastpseudocountsandtheminimumdescriptionlengthprinciple AT schafferalejandroa psiblastpseudocountsandtheminimumdescriptionlengthprinciple AT yuyikuo psiblastpseudocountsandtheminimumdescriptionlengthprinciple |