Cargando…
Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retri...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635310/ https://www.ncbi.nlm.nih.gov/pubmed/17068079 http://dx.doi.org/10.1093/nar/gkl731 |
_version_ | 1782130683933622272 |
---|---|
author | Yu, Yi-Kuo Gertz, E. Michael Agarwala, Richa Schäffer, Alejandro A. Altschul, Stephen F. |
author_facet | Yu, Yi-Kuo Gertz, E. Michael Agarwala, Richa Schäffer, Alejandro A. Altschul, Stephen F. |
author_sort | Yu, Yi-Kuo |
collection | PubMed |
description | Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set. |
format | Text |
id | pubmed-1635310 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-16353102006-12-26 Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches Yu, Yi-Kuo Gertz, E. Michael Agarwala, Richa Schäffer, Alejandro A. Altschul, Stephen F. Nucleic Acids Res Computational Biology Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set. Oxford University Press 2006-11 2006-10-26 /pmc/articles/PMC1635310/ /pubmed/17068079 http://dx.doi.org/10.1093/nar/gkl731 Text en Published by Oxford University Press 2006 |
spellingShingle | Computational Biology Yu, Yi-Kuo Gertz, E. Michael Agarwala, Richa Schäffer, Alejandro A. Altschul, Stephen F. Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches |
title | Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches |
title_full | Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches |
title_fullStr | Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches |
title_full_unstemmed | Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches |
title_short | Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches |
title_sort | retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635310/ https://www.ncbi.nlm.nih.gov/pubmed/17068079 http://dx.doi.org/10.1093/nar/gkl731 |
work_keys_str_mv | AT yuyikuo retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches AT gertzemichael retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches AT agarwalaricha retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches AT schafferalejandroa retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches AT altschulstephenf retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches |