Cargando…

The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment

The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use...

Descripción completa

Detalles Bibliográficos
Autores principales: Sheetlin, Sergey, Park, Yonil, Spouge, John L.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199557/
https://www.ncbi.nlm.nih.gov/pubmed/16147981
http://dx.doi.org/10.1093/nar/gki800
_version_ 1782124868603478016
author Sheetlin, Sergey
Park, Yonil
Spouge, John L.
author_facet Sheetlin, Sergey
Park, Yonil
Spouge, John L.
author_sort Sheetlin, Sergey
collection PubMed
description The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor.
format Text
id pubmed-1199557
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-11995572005-09-15 The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment Sheetlin, Sergey Park, Yonil Spouge, John L. Nucleic Acids Res Article The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor. Oxford University Press 2005 2005-09-06 /pmc/articles/PMC1199557/ /pubmed/16147981 http://dx.doi.org/10.1093/nar/gki800 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle Article
Sheetlin, Sergey
Park, Yonil
Spouge, John L.
The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
title The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
title_full The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
title_fullStr The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
title_full_unstemmed The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
title_short The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
title_sort gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199557/
https://www.ncbi.nlm.nih.gov/pubmed/16147981
http://dx.doi.org/10.1093/nar/gki800
work_keys_str_mv AT sheetlinsergey thegumbelprefactorkforgappedlocalalignmentcanbeestimatedfromsimulationsofglobalalignment
AT parkyonil thegumbelprefactorkforgappedlocalalignmentcanbeestimatedfromsimulationsofglobalalignment
AT spougejohnl thegumbelprefactorkforgappedlocalalignmentcanbeestimatedfromsimulationsofglobalalignment
AT sheetlinsergey gumbelprefactorkforgappedlocalalignmentcanbeestimatedfromsimulationsofglobalalignment
AT parkyonil gumbelprefactorkforgappedlocalalignmentcanbeestimatedfromsimulationsofglobalalignment
AT spougejohnl gumbelprefactorkforgappedlocalalignmentcanbeestimatedfromsimulationsofglobalalignment