Cargando…

Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty

BACKGROUND: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pa...

Descripción completa

Detalles Bibliográficos
Autores principales: Agrawal, Ankit, Huang, Xiaoqiu
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2665049/
https://www.ncbi.nlm.nih.gov/pubmed/19344477
http://dx.doi.org/10.1186/1471-2105-10-S3-S1
_version_ 1782166015327600640
author Agrawal, Ankit
Huang, Xiaoqiu
author_facet Agrawal, Ankit
Huang, Xiaoqiu
author_sort Agrawal, Ankit
collection PubMed
description BACKGROUND: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. RESULTS: Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty. CONCLUSION: The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search.
format Text
id pubmed-2665049
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26650492009-04-06 Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty Agrawal, Ankit Huang, Xiaoqiu BMC Bioinformatics Proceedings BACKGROUND: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. RESULTS: Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty. CONCLUSION: The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search. BioMed Central 2009-03-19 /pmc/articles/PMC2665049/ /pubmed/19344477 http://dx.doi.org/10.1186/1471-2105-10-S3-S1 Text en Copyright © 2009 Agrawal and Huang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Agrawal, Ankit
Huang, Xiaoqiu
Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty
title Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty
title_full Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty
title_fullStr Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty
title_full_unstemmed Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty
title_short Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty
title_sort pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2665049/
https://www.ncbi.nlm.nih.gov/pubmed/19344477
http://dx.doi.org/10.1186/1471-2105-10-S3-S1
work_keys_str_mv AT agrawalankit pairwisestatisticalsignificanceoflocalsequencealignmentusingmultipleparametersetsandempiricaljustificationofparametersetchangepenalty
AT huangxiaoqiu pairwisestatisticalsignificanceoflocalsequencealignmentusingmultipleparametersetsandempiricaljustificationofparametersetchangepenalty