Cargando…

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches

Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retri...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Yi-Kuo, Gertz, E. Michael, Agarwala, Richa, Schäffer, Alejandro A., Altschul, Stephen F.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635310/
https://www.ncbi.nlm.nih.gov/pubmed/17068079
http://dx.doi.org/10.1093/nar/gkl731
_version_ 1782130683933622272
author Yu, Yi-Kuo
Gertz, E. Michael
Agarwala, Richa
Schäffer, Alejandro A.
Altschul, Stephen F.
author_facet Yu, Yi-Kuo
Gertz, E. Michael
Agarwala, Richa
Schäffer, Alejandro A.
Altschul, Stephen F.
author_sort Yu, Yi-Kuo
collection PubMed
description Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.
format Text
id pubmed-1635310
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-16353102006-12-26 Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches Yu, Yi-Kuo Gertz, E. Michael Agarwala, Richa Schäffer, Alejandro A. Altschul, Stephen F. Nucleic Acids Res Computational Biology Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set. Oxford University Press 2006-11 2006-10-26 /pmc/articles/PMC1635310/ /pubmed/17068079 http://dx.doi.org/10.1093/nar/gkl731 Text en Published by Oxford University Press 2006
spellingShingle Computational Biology
Yu, Yi-Kuo
Gertz, E. Michael
Agarwala, Richa
Schäffer, Alejandro A.
Altschul, Stephen F.
Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
title Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
title_full Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
title_fullStr Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
title_full_unstemmed Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
title_short Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
title_sort retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635310/
https://www.ncbi.nlm.nih.gov/pubmed/17068079
http://dx.doi.org/10.1093/nar/gkl731
work_keys_str_mv AT yuyikuo retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches
AT gertzemichael retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches
AT agarwalaricha retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches
AT schafferalejandroa retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches
AT altschulstephenf retrievalaccuracystatisticalsignificanceandcompositionalsimilarityinproteinsequencedatabasesearches