Cargando…

A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores

Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring...

Descripción completa

Detalles Bibliográficos
Autor principal: Bastien, Olivier
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2614193/
https://www.ncbi.nlm.nih.gov/pubmed/19204806
_version_ 1782163220600979456
author Bastien, Olivier
author_facet Bastien, Olivier
author_sort Bastien, Olivier
collection PubMed
description Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Using a simple approach combined with recent results in reliability theory, we demonstrate here that the Karlin-Altshul model can be derived with no reference to the extreme events theory. Sequences were considered as systems in which components are amino acids and having a high redundancy of Information reflected by their alignment scores. Evolution of the information shared between aligned components determined the Shared Amount of Information (SA.I.) between sequences, i.e. the score. The Gumbel distribution parameters of aligned sequences scores find here some theoretical rationale. The first is the Hazard Rate of the distribution of scores between residues and the second is the probability that two aligned residues do not lose bits of information (i.e. conserve an initial pairing score) when a mutation occurs.
format Text
id pubmed-2614193
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-26141932009-02-09 A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores Bastien, Olivier Evol Bioinform Online Rapid Communication Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Using a simple approach combined with recent results in reliability theory, we demonstrate here that the Karlin-Altshul model can be derived with no reference to the extreme events theory. Sequences were considered as systems in which components are amino acids and having a high redundancy of Information reflected by their alignment scores. Evolution of the information shared between aligned components determined the Shared Amount of Information (SA.I.) between sequences, i.e. the score. The Gumbel distribution parameters of aligned sequences scores find here some theoretical rationale. The first is the Hazard Rate of the distribution of scores between residues and the second is the probability that two aligned residues do not lose bits of information (i.e. conserve an initial pairing score) when a mutation occurs. Libertas Academica 2008-02-14 /pmc/articles/PMC2614193/ /pubmed/19204806 Text en Copyright © 2008 The authors. http://creativecommons.org/licenses/by/3.0 This article is published under the Creative Commons Attribution By licence. For further information go to: http://creativecommons.org/licenses/by/3.0. (http://creativecommons.org/licenses/by/3.0)
spellingShingle Rapid Communication
Bastien, Olivier
A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_full A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_fullStr A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_full_unstemmed A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_short A Simple Derivation of the Distribution of Pairwise Local Protein Sequence Alignment Scores
title_sort simple derivation of the distribution of pairwise local protein sequence alignment scores
topic Rapid Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2614193/
https://www.ncbi.nlm.nih.gov/pubmed/19204806
work_keys_str_mv AT bastienolivier asimplederivationofthedistributionofpairwiselocalproteinsequencealignmentscores
AT bastienolivier simplederivationofthedistributionofpairwiselocalproteinsequencealignmentscores