Cargando…

Powerful fusion: PSI-BLAST and consensus sequences

Motivation: A typical PSI-BLAST search consists of iterative scanning and alignment of a large sequence database during which a scoring profile is progressively built and refined. Such a profile can also be stored and used to search against a different database of sequences. Using it to search again...

Descripción completa

Detalles Bibliográficos
Autores principales: Przybylski, Dariusz, Rost, Burkhard
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2577777/
https://www.ncbi.nlm.nih.gov/pubmed/18678588
http://dx.doi.org/10.1093/bioinformatics/btn384
_version_ 1782160509834887168
author Przybylski, Dariusz
Rost, Burkhard
author_facet Przybylski, Dariusz
Rost, Burkhard
author_sort Przybylski, Dariusz
collection PubMed
description Motivation: A typical PSI-BLAST search consists of iterative scanning and alignment of a large sequence database during which a scoring profile is progressively built and refined. Such a profile can also be stored and used to search against a different database of sequences. Using it to search against a database of consensus rather than native sequences is a simple add-on that boosts performance surprisingly well. The improvement comes at a price: we hypothesized that random alignment score statistics would differ between native and consensus sequences. Thus PSI-BLAST-based profile searches against consensus sequences might incorrectly estimate statistical significance of alignment scores. In addition, iterative searches against consensus databases may fail. Here, we addressed these challenges in an attempt to harness the full power of the combination of PSI-BLAST and consensus sequences. Results: We studied alignment score statistics for various types of consensus sequences. In general, the score distribution parameters of profile-based consensus sequence alignments differed significantly from those derived for the native sequences. PSI-BLAST partially compensated for the parameter variation. We have identified a protocol for building specialized consensus sequences that significantly improved search sensitivity and preserved score distribution parameters. As a result, PSI-BLAST profiles can be used to search specialized consensus sequences without sacrificing estimates of statistical significance. We also provided results indicating that iterative PSI-BLAST searches against consensus sequences could work very well. Overall, we showed how a very popular and effective method could be used to identify significantly more relevant similarities among protein sequences. Availability: http://www.rostlab.org/services/consensus/ Contact: dariusz@mit.edu
format Text
id pubmed-2577777
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-25777772008-11-04 Powerful fusion: PSI-BLAST and consensus sequences Przybylski, Dariusz Rost, Burkhard Bioinformatics Original Papers Motivation: A typical PSI-BLAST search consists of iterative scanning and alignment of a large sequence database during which a scoring profile is progressively built and refined. Such a profile can also be stored and used to search against a different database of sequences. Using it to search against a database of consensus rather than native sequences is a simple add-on that boosts performance surprisingly well. The improvement comes at a price: we hypothesized that random alignment score statistics would differ between native and consensus sequences. Thus PSI-BLAST-based profile searches against consensus sequences might incorrectly estimate statistical significance of alignment scores. In addition, iterative searches against consensus databases may fail. Here, we addressed these challenges in an attempt to harness the full power of the combination of PSI-BLAST and consensus sequences. Results: We studied alignment score statistics for various types of consensus sequences. In general, the score distribution parameters of profile-based consensus sequence alignments differed significantly from those derived for the native sequences. PSI-BLAST partially compensated for the parameter variation. We have identified a protocol for building specialized consensus sequences that significantly improved search sensitivity and preserved score distribution parameters. As a result, PSI-BLAST profiles can be used to search specialized consensus sequences without sacrificing estimates of statistical significance. We also provided results indicating that iterative PSI-BLAST searches against consensus sequences could work very well. Overall, we showed how a very popular and effective method could be used to identify significantly more relevant similarities among protein sequences. Availability: http://www.rostlab.org/services/consensus/ Contact: dariusz@mit.edu Oxford University Press 2008-09-15 2008-08-04 /pmc/articles/PMC2577777/ /pubmed/18678588 http://dx.doi.org/10.1093/bioinformatics/btn384 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Przybylski, Dariusz
Rost, Burkhard
Powerful fusion: PSI-BLAST and consensus sequences
title Powerful fusion: PSI-BLAST and consensus sequences
title_full Powerful fusion: PSI-BLAST and consensus sequences
title_fullStr Powerful fusion: PSI-BLAST and consensus sequences
title_full_unstemmed Powerful fusion: PSI-BLAST and consensus sequences
title_short Powerful fusion: PSI-BLAST and consensus sequences
title_sort powerful fusion: psi-blast and consensus sequences
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2577777/
https://www.ncbi.nlm.nih.gov/pubmed/18678588
http://dx.doi.org/10.1093/bioinformatics/btn384
work_keys_str_mv AT przybylskidariusz powerfulfusionpsiblastandconsensussequences
AT rostburkhard powerfulfusionpsiblastandconsensussequences