Cargando…

Supervised machine learning approach to molecular dynamics forecast of SARS-CoV-2 spike glycoproteins at varying temperatures

ABSTRACT: Molecular dynamics (MD) simulations are a widely used technique in modeling complex nanoscale interactions of atoms and molecules. These simulations can provide detailed insight into how molecules behave under certain environmental conditions. This work explores a machine learning (ML) sol...

Descripción completa

Detalles Bibliográficos
Autores principales: Liang, David, Song, Meichen, Niu, Ziyuan, Zhang, Peng, Rafailovich, Miriam, Deng, Yuefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7888691/
https://www.ncbi.nlm.nih.gov/pubmed/33619443
http://dx.doi.org/10.1557/s43580-021-00021-4
_version_ 1783652212175536128
author Liang, David
Song, Meichen
Niu, Ziyuan
Zhang, Peng
Rafailovich, Miriam
Deng, Yuefan
author_facet Liang, David
Song, Meichen
Niu, Ziyuan
Zhang, Peng
Rafailovich, Miriam
Deng, Yuefan
author_sort Liang, David
collection PubMed
description ABSTRACT: Molecular dynamics (MD) simulations are a widely used technique in modeling complex nanoscale interactions of atoms and molecules. These simulations can provide detailed insight into how molecules behave under certain environmental conditions. This work explores a machine learning (ML) solution to predicting long-term properties of SARS-CoV-2 spike glycoproteins (S-protein) through the analysis of its nanosecond backbone RMSD (root-mean-square deviation) MD simulation data at varying temperatures. The simulation data were denoised with fast Fourier transforms. The performance of the models was measured by evaluating their mean squared error (MSE) accuracy scores in recurrent forecasts for long-term predictions. The models evaluated include k-nearest neighbors (kNN) regression models, as well as GRU (gated recurrent unit) neural networks and LSTM (long short-term memory) autoencoder models. Results demonstrated that the kNN model achieved the greatest accuracy in forecasts with MSE scores over around 0.01 nm less than those of the GRU model and the LSTM autoencoder. Furthermore, it demonstrated that the kNN model accuracy increases with data size but can still forecast relatively well when trained on small amounts of data, having achieved MSE scores of around 0.02 nm when trained on 10,000 ns of simulation data. This study provides valuable information on the feasibility of accelerating the MD simulation process through training and predicting supervised ML models, which is particularly applicable in time-sensitive studies. GRAPHIC ABSTRACT: SARS-CoV-2 spike glycoprotein molecular dynamics simulation. Extraction and denoising of backbone RMSD data. Evaluation of k-nearest neighbors regression, GRU neural network, and LSTM autoencoder models in recurrent forecasting for long-term property predictions.
format Online
Article
Text
id pubmed-7888691
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-78886912021-02-18 Supervised machine learning approach to molecular dynamics forecast of SARS-CoV-2 spike glycoproteins at varying temperatures Liang, David Song, Meichen Niu, Ziyuan Zhang, Peng Rafailovich, Miriam Deng, Yuefan MRS Adv Original Paper ABSTRACT: Molecular dynamics (MD) simulations are a widely used technique in modeling complex nanoscale interactions of atoms and molecules. These simulations can provide detailed insight into how molecules behave under certain environmental conditions. This work explores a machine learning (ML) solution to predicting long-term properties of SARS-CoV-2 spike glycoproteins (S-protein) through the analysis of its nanosecond backbone RMSD (root-mean-square deviation) MD simulation data at varying temperatures. The simulation data were denoised with fast Fourier transforms. The performance of the models was measured by evaluating their mean squared error (MSE) accuracy scores in recurrent forecasts for long-term predictions. The models evaluated include k-nearest neighbors (kNN) regression models, as well as GRU (gated recurrent unit) neural networks and LSTM (long short-term memory) autoencoder models. Results demonstrated that the kNN model achieved the greatest accuracy in forecasts with MSE scores over around 0.01 nm less than those of the GRU model and the LSTM autoencoder. Furthermore, it demonstrated that the kNN model accuracy increases with data size but can still forecast relatively well when trained on small amounts of data, having achieved MSE scores of around 0.02 nm when trained on 10,000 ns of simulation data. This study provides valuable information on the feasibility of accelerating the MD simulation process through training and predicting supervised ML models, which is particularly applicable in time-sensitive studies. GRAPHIC ABSTRACT: SARS-CoV-2 spike glycoprotein molecular dynamics simulation. Extraction and denoising of backbone RMSD data. Evaluation of k-nearest neighbors regression, GRU neural network, and LSTM autoencoder models in recurrent forecasting for long-term property predictions. Springer International Publishing 2021-02-17 2021 /pmc/articles/PMC7888691/ /pubmed/33619443 http://dx.doi.org/10.1557/s43580-021-00021-4 Text en © The Author(s), under exclusive licence to The Materials Research Society 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Paper
Liang, David
Song, Meichen
Niu, Ziyuan
Zhang, Peng
Rafailovich, Miriam
Deng, Yuefan
Supervised machine learning approach to molecular dynamics forecast of SARS-CoV-2 spike glycoproteins at varying temperatures
title Supervised machine learning approach to molecular dynamics forecast of SARS-CoV-2 spike glycoproteins at varying temperatures
title_full Supervised machine learning approach to molecular dynamics forecast of SARS-CoV-2 spike glycoproteins at varying temperatures
title_fullStr Supervised machine learning approach to molecular dynamics forecast of SARS-CoV-2 spike glycoproteins at varying temperatures
title_full_unstemmed Supervised machine learning approach to molecular dynamics forecast of SARS-CoV-2 spike glycoproteins at varying temperatures
title_short Supervised machine learning approach to molecular dynamics forecast of SARS-CoV-2 spike glycoproteins at varying temperatures
title_sort supervised machine learning approach to molecular dynamics forecast of sars-cov-2 spike glycoproteins at varying temperatures
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7888691/
https://www.ncbi.nlm.nih.gov/pubmed/33619443
http://dx.doi.org/10.1557/s43580-021-00021-4
work_keys_str_mv AT liangdavid supervisedmachinelearningapproachtomoleculardynamicsforecastofsarscov2spikeglycoproteinsatvaryingtemperatures
AT songmeichen supervisedmachinelearningapproachtomoleculardynamicsforecastofsarscov2spikeglycoproteinsatvaryingtemperatures
AT niuziyuan supervisedmachinelearningapproachtomoleculardynamicsforecastofsarscov2spikeglycoproteinsatvaryingtemperatures
AT zhangpeng supervisedmachinelearningapproachtomoleculardynamicsforecastofsarscov2spikeglycoproteinsatvaryingtemperatures
AT rafailovichmiriam supervisedmachinelearningapproachtomoleculardynamicsforecastofsarscov2spikeglycoproteinsatvaryingtemperatures
AT dengyuefan supervisedmachinelearningapproachtomoleculardynamicsforecastofsarscov2spikeglycoproteinsatvaryingtemperatures