Cargando…
EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM
MOTIVATION: Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutati...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355264/ https://www.ncbi.nlm.nih.gov/pubmed/32657367 http://dx.doi.org/10.1093/bioinformatics/btaa447 |
_version_ | 1783558240019152896 |
---|---|
author | Lim, Dongjoon Blanchette, Mathieu |
author_facet | Lim, Dongjoon Blanchette, Mathieu |
author_sort | Lim, Dongjoon |
collection | PubMed |
description | MOTIVATION: Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. RESULTS: We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. AVAILABILITY AND IMPLEMENTATION: Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7355264 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73552642020-07-16 EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM Lim, Dongjoon Blanchette, Mathieu Bioinformatics Population Genomics and Molecular Evolution MOTIVATION: Accurate probabilistic models of sequence evolution are essential for a wide variety of bioinformatics tasks, including sequence alignment and phylogenetic inference. The ability to realistically simulate sequence evolution is also at the core of many benchmarking strategies. Yet, mutational processes have complex context dependencies that remain poorly modeled and understood. RESULTS: We introduce EvoLSTM, a recurrent neural network-based evolution simulator that captures mutational context dependencies. EvoLSTM uses a sequence-to-sequence long short-term memory model trained to predict mutation probabilities at each position of a given sequence, taking into consideration the 14 flanking nucleotides. EvoLSTM can realistically simulate mammalian and plant DNA sequence evolution and reveals unexpectedly strong long-range context dependencies in mutation probabilities. EvoLSTM brings modern machine-learning approaches to bear on sequence evolution. It will serve as a useful tool to study and simulate complex mutational processes. AVAILABILITY AND IMPLEMENTATION: Code and dataset are available at https://github.com/DongjoonLim/EvoLSTM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355264/ /pubmed/32657367 http://dx.doi.org/10.1093/bioinformatics/btaa447 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Population Genomics and Molecular Evolution Lim, Dongjoon Blanchette, Mathieu EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM |
title | EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM |
title_full | EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM |
title_fullStr | EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM |
title_full_unstemmed | EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM |
title_short | EvoLSTM: context-dependent models of sequence evolution using a sequence-to-sequence LSTM |
title_sort | evolstm: context-dependent models of sequence evolution using a sequence-to-sequence lstm |
topic | Population Genomics and Molecular Evolution |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355264/ https://www.ncbi.nlm.nih.gov/pubmed/32657367 http://dx.doi.org/10.1093/bioinformatics/btaa447 |
work_keys_str_mv | AT limdongjoon evolstmcontextdependentmodelsofsequenceevolutionusingasequencetosequencelstm AT blanchettemathieu evolstmcontextdependentmodelsofsequenceevolutionusingasequencetosequencelstm |