Cargando…
Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform
Recently we presented a frequentist dynamic programming (DP) approach for multiple sequence alignment based on the explicit model of indel evolution Poisson Indel Process (PIP). This phylogeny-aware approach produces evolutionary meaningful gap patterns and is robust to the ‘over-alignment’ bias. De...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671320/ https://www.ncbi.nlm.nih.gov/pubmed/33575636 http://dx.doi.org/10.1093/nargab/lqaa092 |
Sumario: | Recently we presented a frequentist dynamic programming (DP) approach for multiple sequence alignment based on the explicit model of indel evolution Poisson Indel Process (PIP). This phylogeny-aware approach produces evolutionary meaningful gap patterns and is robust to the ‘over-alignment’ bias. Despite linear time complexity for the computation of marginal likelihoods, the overall method’s complexity is cubic in sequence length. Inspired by the popular aligner MAFFT, we propose a new technique to accelerate the evolutionary indel based alignment. Amino acid sequences are converted to sequences representing their physicochemical properties, and homologous blocks are identified by multi-scale short-time Fourier transform. Three three-dimensional DP matrices are then created under PIP, with homologous blocks defining sparse structures where most cells are excluded from the calculations. The homologous blocks are connected through intermediate ‘linking blocks’. The homologous and linking blocks are aligned under PIP as independent DP sub-matrices and their tracebacks merged to yield the final alignment. The new algorithm can largely profit from parallel computing, yielding a theoretical speed-up estimated to be proportional to the cubic power of the number of sub-blocks in the DP matrices. We compare the new method to the original PIP approach and demonstrate it on real data. |
---|