Cargando…

Statistical distributions of optimal global alignment scores of random protein sequences

BACKGROUND: The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined. RESULTS: In this study, random and real but un...

Descripción completa

Detalles Bibliográficos
Autores principales: Pang, Hongxia, Tang, Jiaowei, Chen, Su-Shing, Tao, Shiheng
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1276786/
https://www.ncbi.nlm.nih.gov/pubmed/16225696
http://dx.doi.org/10.1186/1471-2105-6-257
_version_ 1782126006414344192
author Pang, Hongxia
Tang, Jiaowei
Chen, Su-Shing
Tao, Shiheng
author_facet Pang, Hongxia
Tang, Jiaowei
Chen, Su-Shing
Tao, Shiheng
author_sort Pang, Hongxia
collection PubMed
description BACKGROUND: The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined. RESULTS: In this study, random and real but unrelated sequences prepared in six different ways were selected as reference datasets to obtain their respective statistical distributions of global alignment scores. All alignments were carried out with the Needleman-Wunsch algorithm and optimal scores were fitted to the Gumbel, normal and gamma distributions respectively. The three-parameter gamma distribution performs the best as the theoretical distribution function of global alignment scores, as it agrees perfectly well with the distribution of alignment scores. The normal distribution also agrees well with the score distribution frequencies when the shape parameter of the gamma distribution is sufficiently large, for this is the scenario when the normal distribution can be viewed as an approximation of the gamma distribution. CONCLUSION: We have shown that the optimal global alignment scores of random protein sequences fit the three-parameter gamma distribution function. This would be useful for the inference of homology between sequences whose relationship is unknown, through the evaluation of gamma distribution significance between sequences.
format Text
id pubmed-1276786
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-12767862005-11-16 Statistical distributions of optimal global alignment scores of random protein sequences Pang, Hongxia Tang, Jiaowei Chen, Su-Shing Tao, Shiheng BMC Bioinformatics Research Article BACKGROUND: The inference of homology from statistically significant sequence similarity is a central issue in sequence alignments. So far the statistical distribution function underlying the optimal global alignments has not been completely determined. RESULTS: In this study, random and real but unrelated sequences prepared in six different ways were selected as reference datasets to obtain their respective statistical distributions of global alignment scores. All alignments were carried out with the Needleman-Wunsch algorithm and optimal scores were fitted to the Gumbel, normal and gamma distributions respectively. The three-parameter gamma distribution performs the best as the theoretical distribution function of global alignment scores, as it agrees perfectly well with the distribution of alignment scores. The normal distribution also agrees well with the score distribution frequencies when the shape parameter of the gamma distribution is sufficiently large, for this is the scenario when the normal distribution can be viewed as an approximation of the gamma distribution. CONCLUSION: We have shown that the optimal global alignment scores of random protein sequences fit the three-parameter gamma distribution function. This would be useful for the inference of homology between sequences whose relationship is unknown, through the evaluation of gamma distribution significance between sequences. BioMed Central 2005-10-15 /pmc/articles/PMC1276786/ /pubmed/16225696 http://dx.doi.org/10.1186/1471-2105-6-257 Text en Copyright © 2005 Pang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Pang, Hongxia
Tang, Jiaowei
Chen, Su-Shing
Tao, Shiheng
Statistical distributions of optimal global alignment scores of random protein sequences
title Statistical distributions of optimal global alignment scores of random protein sequences
title_full Statistical distributions of optimal global alignment scores of random protein sequences
title_fullStr Statistical distributions of optimal global alignment scores of random protein sequences
title_full_unstemmed Statistical distributions of optimal global alignment scores of random protein sequences
title_short Statistical distributions of optimal global alignment scores of random protein sequences
title_sort statistical distributions of optimal global alignment scores of random protein sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1276786/
https://www.ncbi.nlm.nih.gov/pubmed/16225696
http://dx.doi.org/10.1186/1471-2105-6-257
work_keys_str_mv AT panghongxia statisticaldistributionsofoptimalglobalalignmentscoresofrandomproteinsequences
AT tangjiaowei statisticaldistributionsofoptimalglobalalignmentscoresofrandomproteinsequences
AT chensushing statisticaldistributionsofoptimalglobalalignmentscoresofrandomproteinsequences
AT taoshiheng statisticaldistributionsofoptimalglobalalignmentscoresofrandomproteinsequences