Cargando…
The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment
Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559576/ https://www.ncbi.nlm.nih.gov/pubmed/32653921 http://dx.doi.org/10.1093/sysbio/syaa050 |
_version_ | 1784592786000969728 |
---|---|
author | De Maio, Nicola |
author_facet | De Maio, Nicola |
author_sort | De Maio, Nicola |
collection | PubMed |
description | Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models of sequence evolution addresses these issues by replacing heuristic score functions with evolutionary model-based probabilities. However, score-based aligners and fixed-alignment phylogenetic approaches are still more prevalent than methods based on evolutionary indel models, mostly due to computational convenience. Here, I present new techniques for improving the accuracy and speed of statistical evolutionary alignment. The “cumulative indel model” approximates realistic evolutionary indel dynamics using differential equations. “Adaptive banding” reduces the computational demand of most alignment algorithms without requiring prior knowledge of divergence levels or pseudo-optimal alignments. Using simulations, I show that these methods lead to fast and accurate pairwise alignment inference. Also, I show that it is possible, with these methods, to align and infer evolutionary parameters from a single long synteny block ([Formula: see text] 530 kbp) between the human and chimp genomes. The cumulative indel model and adaptive banding can therefore improve the performance of alignment and phylogenetic methods. [Evolutionary alignment; pairHMM; sequence evolution; statistical alignment; statistical genetics.] |
format | Online Article Text |
id | pubmed-8559576 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85595762021-11-02 The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment De Maio, Nicola Syst Biol Regular Articles Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models of sequence evolution addresses these issues by replacing heuristic score functions with evolutionary model-based probabilities. However, score-based aligners and fixed-alignment phylogenetic approaches are still more prevalent than methods based on evolutionary indel models, mostly due to computational convenience. Here, I present new techniques for improving the accuracy and speed of statistical evolutionary alignment. The “cumulative indel model” approximates realistic evolutionary indel dynamics using differential equations. “Adaptive banding” reduces the computational demand of most alignment algorithms without requiring prior knowledge of divergence levels or pseudo-optimal alignments. Using simulations, I show that these methods lead to fast and accurate pairwise alignment inference. Also, I show that it is possible, with these methods, to align and infer evolutionary parameters from a single long synteny block ([Formula: see text] 530 kbp) between the human and chimp genomes. The cumulative indel model and adaptive banding can therefore improve the performance of alignment and phylogenetic methods. [Evolutionary alignment; pairHMM; sequence evolution; statistical alignment; statistical genetics.] Oxford University Press 2020-07-12 /pmc/articles/PMC8559576/ /pubmed/32653921 http://dx.doi.org/10.1093/sysbio/syaa050 Text en © The Author(s) 2020. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Regular Articles De Maio, Nicola The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment |
title | The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary
Alignment |
title_full | The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary
Alignment |
title_fullStr | The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary
Alignment |
title_full_unstemmed | The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary
Alignment |
title_short | The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary
Alignment |
title_sort | cumulative indel model: fast and accurate statistical evolutionary
alignment |
topic | Regular Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559576/ https://www.ncbi.nlm.nih.gov/pubmed/32653921 http://dx.doi.org/10.1093/sysbio/syaa050 |
work_keys_str_mv | AT demaionicola thecumulativeindelmodelfastandaccuratestatisticalevolutionaryalignment AT demaionicola cumulativeindelmodelfastandaccuratestatisticalevolutionaryalignment |