Cargando…

Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform

Recently we presented a frequentist dynamic programming (DP) approach for multiple sequence alignment based on the explicit model of indel evolution Poisson Indel Process (PIP). This phylogeny-aware approach produces evolutionary meaningful gap patterns and is robust to the ‘over-alignment’ bias. De...

Descripción completa

Detalles Bibliográficos
Autores principales: Maiolo, Massimo, Ulzega, Simone, Gil, Manuel, Anisimova, Maria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671320/
https://www.ncbi.nlm.nih.gov/pubmed/33575636
http://dx.doi.org/10.1093/nargab/lqaa092
_version_ 1783610907340832768
author Maiolo, Massimo
Ulzega, Simone
Gil, Manuel
Anisimova, Maria
author_facet Maiolo, Massimo
Ulzega, Simone
Gil, Manuel
Anisimova, Maria
author_sort Maiolo, Massimo
collection PubMed
description Recently we presented a frequentist dynamic programming (DP) approach for multiple sequence alignment based on the explicit model of indel evolution Poisson Indel Process (PIP). This phylogeny-aware approach produces evolutionary meaningful gap patterns and is robust to the ‘over-alignment’ bias. Despite linear time complexity for the computation of marginal likelihoods, the overall method’s complexity is cubic in sequence length. Inspired by the popular aligner MAFFT, we propose a new technique to accelerate the evolutionary indel based alignment. Amino acid sequences are converted to sequences representing their physicochemical properties, and homologous blocks are identified by multi-scale short-time Fourier transform. Three three-dimensional DP matrices are then created under PIP, with homologous blocks defining sparse structures where most cells are excluded from the calculations. The homologous blocks are connected through intermediate ‘linking blocks’. The homologous and linking blocks are aligned under PIP as independent DP sub-matrices and their tracebacks merged to yield the final alignment. The new algorithm can largely profit from parallel computing, yielding a theoretical speed-up estimated to be proportional to the cubic power of the number of sub-blocks in the DP matrices. We compare the new method to the original PIP approach and demonstrate it on real data.
format Online
Article
Text
id pubmed-7671320
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76713202021-02-10 Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform Maiolo, Massimo Ulzega, Simone Gil, Manuel Anisimova, Maria NAR Genom Bioinform Methods Article Recently we presented a frequentist dynamic programming (DP) approach for multiple sequence alignment based on the explicit model of indel evolution Poisson Indel Process (PIP). This phylogeny-aware approach produces evolutionary meaningful gap patterns and is robust to the ‘over-alignment’ bias. Despite linear time complexity for the computation of marginal likelihoods, the overall method’s complexity is cubic in sequence length. Inspired by the popular aligner MAFFT, we propose a new technique to accelerate the evolutionary indel based alignment. Amino acid sequences are converted to sequences representing their physicochemical properties, and homologous blocks are identified by multi-scale short-time Fourier transform. Three three-dimensional DP matrices are then created under PIP, with homologous blocks defining sparse structures where most cells are excluded from the calculations. The homologous blocks are connected through intermediate ‘linking blocks’. The homologous and linking blocks are aligned under PIP as independent DP sub-matrices and their tracebacks merged to yield the final alignment. The new algorithm can largely profit from parallel computing, yielding a theoretical speed-up estimated to be proportional to the cubic power of the number of sub-blocks in the DP matrices. We compare the new method to the original PIP approach and demonstrate it on real data. Oxford University Press 2020-11-06 /pmc/articles/PMC7671320/ /pubmed/33575636 http://dx.doi.org/10.1093/nargab/lqaa092 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Article
Maiolo, Massimo
Ulzega, Simone
Gil, Manuel
Anisimova, Maria
Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform
title Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform
title_full Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform
title_fullStr Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform
title_full_unstemmed Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform
title_short Accelerating phylogeny-aware alignment with indel evolution using short time Fourier transform
title_sort accelerating phylogeny-aware alignment with indel evolution using short time fourier transform
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671320/
https://www.ncbi.nlm.nih.gov/pubmed/33575636
http://dx.doi.org/10.1093/nargab/lqaa092
work_keys_str_mv AT maiolomassimo acceleratingphylogenyawarealignmentwithindelevolutionusingshorttimefouriertransform
AT ulzegasimone acceleratingphylogenyawarealignmentwithindelevolutionusingshorttimefouriertransform
AT gilmanuel acceleratingphylogenyawarealignmentwithindelevolutionusingshorttimefouriertransform
AT anisimovamaria acceleratingphylogenyawarealignmentwithindelevolutionusingshorttimefouriertransform