Cargando…

The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment

Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models...

Descripción completa

Detalles Bibliográficos
Autor principal: De Maio, Nicola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559576/
https://www.ncbi.nlm.nih.gov/pubmed/32653921
http://dx.doi.org/10.1093/sysbio/syaa050
_version_ 1784592786000969728
author De Maio, Nicola
author_facet De Maio, Nicola
author_sort De Maio, Nicola
collection PubMed
description Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models of sequence evolution addresses these issues by replacing heuristic score functions with evolutionary model-based probabilities. However, score-based aligners and fixed-alignment phylogenetic approaches are still more prevalent than methods based on evolutionary indel models, mostly due to computational convenience. Here, I present new techniques for improving the accuracy and speed of statistical evolutionary alignment. The “cumulative indel model” approximates realistic evolutionary indel dynamics using differential equations. “Adaptive banding” reduces the computational demand of most alignment algorithms without requiring prior knowledge of divergence levels or pseudo-optimal alignments. Using simulations, I show that these methods lead to fast and accurate pairwise alignment inference. Also, I show that it is possible, with these methods, to align and infer evolutionary parameters from a single long synteny block ([Formula: see text] 530 kbp) between the human and chimp genomes. The cumulative indel model and adaptive banding can therefore improve the performance of alignment and phylogenetic methods. [Evolutionary alignment; pairHMM; sequence evolution; statistical alignment; statistical genetics.]
format Online
Article
Text
id pubmed-8559576
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85595762021-11-02 The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment De Maio, Nicola Syst Biol Regular Articles Sequence alignment is essential for phylogenetic and molecular evolution inference, as well as in many other areas of bioinformatics and evolutionary biology. Inaccurate alignments can lead to severe biases in most downstream statistical analyses. Statistical alignment based on probabilistic models of sequence evolution addresses these issues by replacing heuristic score functions with evolutionary model-based probabilities. However, score-based aligners and fixed-alignment phylogenetic approaches are still more prevalent than methods based on evolutionary indel models, mostly due to computational convenience. Here, I present new techniques for improving the accuracy and speed of statistical evolutionary alignment. The “cumulative indel model” approximates realistic evolutionary indel dynamics using differential equations. “Adaptive banding” reduces the computational demand of most alignment algorithms without requiring prior knowledge of divergence levels or pseudo-optimal alignments. Using simulations, I show that these methods lead to fast and accurate pairwise alignment inference. Also, I show that it is possible, with these methods, to align and infer evolutionary parameters from a single long synteny block ([Formula: see text] 530 kbp) between the human and chimp genomes. The cumulative indel model and adaptive banding can therefore improve the performance of alignment and phylogenetic methods. [Evolutionary alignment; pairHMM; sequence evolution; statistical alignment; statistical genetics.] Oxford University Press 2020-07-12 /pmc/articles/PMC8559576/ /pubmed/32653921 http://dx.doi.org/10.1093/sysbio/syaa050 Text en © The Author(s) 2020. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
De Maio, Nicola
The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment
title The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment
title_full The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment
title_fullStr The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment
title_full_unstemmed The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment
title_short The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment
title_sort cumulative indel model: fast and accurate statistical evolutionary alignment
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559576/
https://www.ncbi.nlm.nih.gov/pubmed/32653921
http://dx.doi.org/10.1093/sysbio/syaa050
work_keys_str_mv AT demaionicola thecumulativeindelmodelfastandaccuratestatisticalevolutionaryalignment
AT demaionicola cumulativeindelmodelfastandaccuratestatisticalevolutionaryalignment