Cargando…

Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors

The information criterion of minimum message length (MML) provides a powerful statistical framework for inductive reasoning from observed data. We apply MML to the problem of protein sequence comparison using finite state models with Dirichlet distributions. The resulting framework allows us to supe...

Descripción completa

Detalles Bibliográficos
Autores principales: Sumanaweera, Dinithi, Allison, Lloyd, Konagurthu, Arun S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612809/
https://www.ncbi.nlm.nih.gov/pubmed/31510703
http://dx.doi.org/10.1093/bioinformatics/btz368
_version_ 1783432941903282176
author Sumanaweera, Dinithi
Allison, Lloyd
Konagurthu, Arun S
author_facet Sumanaweera, Dinithi
Allison, Lloyd
Konagurthu, Arun S
author_sort Sumanaweera, Dinithi
collection PubMed
description The information criterion of minimum message length (MML) provides a powerful statistical framework for inductive reasoning from observed data. We apply MML to the problem of protein sequence comparison using finite state models with Dirichlet distributions. The resulting framework allows us to supersede the ad hoc cost functions commonly used in the field, by systematically addressing the problem of arbitrariness in alignment parameters, and the disconnect between substitution scores and gap costs. Furthermore, our framework enables the generation of marginal probability landscapes over all possible alignment hypotheses, with potential to facilitate the users to simultaneously rationalize and assess competing alignment relationships between protein sequences, beyond simply reporting a single (best) alignment. We demonstrate the performance of our program on benchmarks containing distantly related protein sequences. AVAILABILITY AND IMPLEMENTATION: The open-source program supporting this work is available from: http://lcb.infotech.monash.edu.au/seqmmligner. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6612809
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66128092019-07-12 Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors Sumanaweera, Dinithi Allison, Lloyd Konagurthu, Arun S Bioinformatics Ismb/Eccb 2019 Conference Proceedings The information criterion of minimum message length (MML) provides a powerful statistical framework for inductive reasoning from observed data. We apply MML to the problem of protein sequence comparison using finite state models with Dirichlet distributions. The resulting framework allows us to supersede the ad hoc cost functions commonly used in the field, by systematically addressing the problem of arbitrariness in alignment parameters, and the disconnect between substitution scores and gap costs. Furthermore, our framework enables the generation of marginal probability landscapes over all possible alignment hypotheses, with potential to facilitate the users to simultaneously rationalize and assess competing alignment relationships between protein sequences, beyond simply reporting a single (best) alignment. We demonstrate the performance of our program on benchmarks containing distantly related protein sequences. AVAILABILITY AND IMPLEMENTATION: The open-source program supporting this work is available from: http://lcb.infotech.monash.edu.au/seqmmligner. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612809/ /pubmed/31510703 http://dx.doi.org/10.1093/bioinformatics/btz368 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2019 Conference Proceedings
Sumanaweera, Dinithi
Allison, Lloyd
Konagurthu, Arun S
Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors
title Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors
title_full Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors
title_fullStr Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors
title_full_unstemmed Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors
title_short Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors
title_sort statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and dirichlet priors
topic Ismb/Eccb 2019 Conference Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612809/
https://www.ncbi.nlm.nih.gov/pubmed/31510703
http://dx.doi.org/10.1093/bioinformatics/btz368
work_keys_str_mv AT sumanaweeradinithi statisticalcompressionofproteinsequencesandinferenceofmarginalprobabilitylandscapesovercompetingalignmentsusingfinitestatemodelsanddirichletpriors
AT allisonlloyd statisticalcompressionofproteinsequencesandinferenceofmarginalprobabilitylandscapesovercompetingalignmentsusingfinitestatemodelsanddirichletpriors
AT konagurthuaruns statisticalcompressionofproteinsequencesandinferenceofmarginalprobabilitylandscapesovercompetingalignmentsusingfinitestatemodelsanddirichletpriors