Cargando…

How sequence alignment scores correspond to probability models

MOTIVATION: Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to eac...

Descripción completa

Detalles Bibliográficos
Autor principal: Frith, Martin C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9883716/
https://www.ncbi.nlm.nih.gov/pubmed/31329241
http://dx.doi.org/10.1093/bioinformatics/btz576
_version_ 1784879564719128576
author Frith, Martin C
author_facet Frith, Martin C
author_sort Frith, Martin C
collection PubMed
description MOTIVATION: Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to each kind of data, calculate the reliability of alignment parts and measure sequence similarity integrated over possible alignments. RESULTS: This study shows how multiple models correspond to one set of scores. Scores can be converted to probabilities by partition functions with a ‘temperature’ parameter: for any temperature, this corresponds to some PHMM. There is a special class of models with balanced length probability, i.e. no bias toward either longer or shorter alignments. The best way to score alignments and assess their significance depends on the aim: judging whether whole sequences are related versus finding related parts. This clarifies the statistical basis of sequence alignment. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9883716
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98837162023-02-01 How sequence alignment scores correspond to probability models Frith, Martin C Bioinformatics Original Papers MOTIVATION: Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to each kind of data, calculate the reliability of alignment parts and measure sequence similarity integrated over possible alignments. RESULTS: This study shows how multiple models correspond to one set of scores. Scores can be converted to probabilities by partition functions with a ‘temperature’ parameter: for any temperature, this corresponds to some PHMM. There is a special class of models with balanced length probability, i.e. no bias toward either longer or shorter alignments. The best way to score alignments and assess their significance depends on the aim: judging whether whole sequences are related versus finding related parts. This clarifies the statistical basis of sequence alignment. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07-22 /pmc/articles/PMC9883716/ /pubmed/31329241 http://dx.doi.org/10.1093/bioinformatics/btz576 Text en © The Author(s) 2019. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Frith, Martin C
How sequence alignment scores correspond to probability models
title How sequence alignment scores correspond to probability models
title_full How sequence alignment scores correspond to probability models
title_fullStr How sequence alignment scores correspond to probability models
title_full_unstemmed How sequence alignment scores correspond to probability models
title_short How sequence alignment scores correspond to probability models
title_sort how sequence alignment scores correspond to probability models
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9883716/
https://www.ncbi.nlm.nih.gov/pubmed/31329241
http://dx.doi.org/10.1093/bioinformatics/btz576
work_keys_str_mv AT frithmartinc howsequencealignmentscorescorrespondtoprobabilitymodels