Cargando…

EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM

MOTIVATION: Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutati...

Descripción completa

Detalles Bibliográficos
Autores principales: Lim, Dongjoon, Blanchette, Mathieu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355264/
https://www.ncbi.nlm.nih.gov/pubmed/32657367
http://dx.doi.org/10.1093/bioinformatics/btaa447
_version_ 1783558240019152896
author Lim, Dongjoon
Blanchette, Mathieu
author_facet Lim, Dongjoon
Blanchette, Mathieu
author_sort Lim, Dongjoon
collection PubMed
description MOTIVATION: Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. RESULTS: We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. AVAILABILITY AND IMPLEMENTATION: Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7355264
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73552642020-07-16 EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM Lim, Dongjoon Blanchette, Mathieu Bioinformatics Population Genomics and Molecular Evolution MOTIVATION: Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. RESULTS: We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. AVAILABILITY AND IMPLEMENTATION: Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355264/ /pubmed/32657367 http://dx.doi.org/10.1093/bioinformatics/btaa447 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Population Genomics and Molecular Evolution
Lim, Dongjoon
Blanchette, Mathieu
EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM
title EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM
title_full EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM
title_fullStr EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM
title_full_unstemmed EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM
title_short EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM
title_sort evolstm: context-dependent models of sequence evolution using a sequence-to-sequence lstm
topic Population Genomics and Molecular Evolution
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355264/
https://www.ncbi.nlm.nih.gov/pubmed/32657367
http://dx.doi.org/10.1093/bioinformatics/btaa447
work_keys_str_mv AT limdongjoon evolstmcontextdependentmodelsofsequenceevolutionusingasequencetosequencelstm
AT blanchettemathieu evolstmcontextdependentmodelsofsequenceevolutionusingasequencetosequencelstm