Cargando…
Remote homology search with hidden Potts models
Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignme...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728182/ https://www.ncbi.nlm.nih.gov/pubmed/33253143 http://dx.doi.org/10.1371/journal.pcbi.1008085 |
_version_ | 1783621218398633984 |
---|---|
author | Wilburn, Grey W. Eddy, Sean R. |
author_facet | Wilburn, Grey W. Eddy, Sean R. |
author_sort | Wilburn, Grey W. |
collection | PubMed |
description | Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments. |
format | Online Article Text |
id | pubmed-7728182 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-77281822020-12-16 Remote homology search with hidden Potts models Wilburn, Grey W. Eddy, Sean R. PLoS Comput Biol Research Article Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments. Public Library of Science 2020-11-30 /pmc/articles/PMC7728182/ /pubmed/33253143 http://dx.doi.org/10.1371/journal.pcbi.1008085 Text en © 2020 Wilburn, Eddy http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Wilburn, Grey W. Eddy, Sean R. Remote homology search with hidden Potts models |
title | Remote homology search with hidden Potts models |
title_full | Remote homology search with hidden Potts models |
title_fullStr | Remote homology search with hidden Potts models |
title_full_unstemmed | Remote homology search with hidden Potts models |
title_short | Remote homology search with hidden Potts models |
title_sort | remote homology search with hidden potts models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728182/ https://www.ncbi.nlm.nih.gov/pubmed/33253143 http://dx.doi.org/10.1371/journal.pcbi.1008085 |
work_keys_str_mv | AT wilburngreyw remotehomologysearchwithhiddenpottsmodels AT eddyseanr remotehomologysearchwithhiddenpottsmodels |