Cargando…

Remote homology search with hidden Potts models

Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignme...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilburn, Grey W., Eddy, Sean R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728182/
https://www.ncbi.nlm.nih.gov/pubmed/33253143
http://dx.doi.org/10.1371/journal.pcbi.1008085
_version_ 1783621218398633984
author Wilburn, Grey W.
Eddy, Sean R.
author_facet Wilburn, Grey W.
Eddy, Sean R.
author_sort Wilburn, Grey W.
collection PubMed
description Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.
format Online
Article
Text
id pubmed-7728182
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-77281822020-12-16 Remote homology search with hidden Potts models Wilburn, Grey W. Eddy, Sean R. PLoS Comput Biol Research Article Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments. Public Library of Science 2020-11-30 /pmc/articles/PMC7728182/ /pubmed/33253143 http://dx.doi.org/10.1371/journal.pcbi.1008085 Text en © 2020 Wilburn, Eddy http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wilburn, Grey W.
Eddy, Sean R.
Remote homology search with hidden Potts models
title Remote homology search with hidden Potts models
title_full Remote homology search with hidden Potts models
title_fullStr Remote homology search with hidden Potts models
title_full_unstemmed Remote homology search with hidden Potts models
title_short Remote homology search with hidden Potts models
title_sort remote homology search with hidden potts models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728182/
https://www.ncbi.nlm.nih.gov/pubmed/33253143
http://dx.doi.org/10.1371/journal.pcbi.1008085
work_keys_str_mv AT wilburngreyw remotehomologysearchwithhiddenpottsmodels
AT eddyseanr remotehomologysearchwithhiddenpottsmodels