Cargando…

Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation

The worldwide coronavirus (COVID-19) pandemic made dramatic and rapid progress in the year 2020 and requires urgent global effort to accelerate the development of a vaccine to stop the daily infections and deaths. Several types of vaccine have been designed to teach the immune system how to fight of...

Descripción completa

Detalles Bibliográficos
Autores principales: Qaid, Talal S., Mazaar, Hussein, Alqahtani, Mohammed S., Raweh, Abeer A., Alakwaa, Wafaa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237341/
https://www.ncbi.nlm.nih.gov/pubmed/34239977
http://dx.doi.org/10.7717/peerj-cs.597
_version_ 1783714710391095296
author Qaid, Talal S.
Mazaar, Hussein
Alqahtani, Mohammed S.
Raweh, Abeer A.
Alakwaa, Wafaa
author_facet Qaid, Talal S.
Mazaar, Hussein
Alqahtani, Mohammed S.
Raweh, Abeer A.
Alakwaa, Wafaa
author_sort Qaid, Talal S.
collection PubMed
description The worldwide coronavirus (COVID-19) pandemic made dramatic and rapid progress in the year 2020 and requires urgent global effort to accelerate the development of a vaccine to stop the daily infections and deaths. Several types of vaccine have been designed to teach the immune system how to fight off certain kinds of pathogens. mRNA vaccines are the most important candidate vaccines because of their capacity for rapid development, high potency, safe administration and potential for low-cost manufacture. mRNA vaccine acts by training the body to recognize and response to the proteins produced by disease-causing organisms such as viruses or bacteria. This type of vaccine is the fastest candidate to treat COVID-19 but it currently facing several limitations. In particular, it is a challenge to design stable mRNA molecules because of the inefficient in vivo delivery of mRNA, its tendency for spontaneous degradation and low protein expression levels. This work designed and implemented a sequence deep model based on bidirectional GRU and LSTM models applied on the Stanford COVID-19 mRNA vaccine dataset to predict the mRNA sequences responsible for degradation by predicting five reactivity values for every position in the sequence. Four of these values determine the likelihood of degradation with/without magnesium at high pH (pH 10) and high temperature (50 degrees Celsius) and the fifth reactivity value is used to determine the likely secondary structure of the RNA sample. The model relies on two types of features, namely numerical and categorical features, where the categorical features are extracted from the mRNA sequences, structure and predicted loop. These features are represented and encoded by numbers, and then, the features are extracted using embedding layer learning. There are five numerical features depending on the likelihood for each pair of nucleotides in the RNA. The model gives promising results because it predicts the five reactivity values with a validation mean columnwise root mean square error (MCRMSE) of 0.125 using LSTM model with augmentation and the codon encoding method. Codon encoding outperforms Base encoding in MCRMSE validation error using the LSTM model meanwhile Base encoding outperforms codon encoding due to less over-fitting and the difference between the training and validation loss error is 0.008.
format Online
Article
Text
id pubmed-8237341
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-82373412021-07-07 Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation Qaid, Talal S. Mazaar, Hussein Alqahtani, Mohammed S. Raweh, Abeer A. Alakwaa, Wafaa PeerJ Comput Sci Bioinformatics The worldwide coronavirus (COVID-19) pandemic made dramatic and rapid progress in the year 2020 and requires urgent global effort to accelerate the development of a vaccine to stop the daily infections and deaths. Several types of vaccine have been designed to teach the immune system how to fight off certain kinds of pathogens. mRNA vaccines are the most important candidate vaccines because of their capacity for rapid development, high potency, safe administration and potential for low-cost manufacture. mRNA vaccine acts by training the body to recognize and response to the proteins produced by disease-causing organisms such as viruses or bacteria. This type of vaccine is the fastest candidate to treat COVID-19 but it currently facing several limitations. In particular, it is a challenge to design stable mRNA molecules because of the inefficient in vivo delivery of mRNA, its tendency for spontaneous degradation and low protein expression levels. This work designed and implemented a sequence deep model based on bidirectional GRU and LSTM models applied on the Stanford COVID-19 mRNA vaccine dataset to predict the mRNA sequences responsible for degradation by predicting five reactivity values for every position in the sequence. Four of these values determine the likelihood of degradation with/without magnesium at high pH (pH 10) and high temperature (50 degrees Celsius) and the fifth reactivity value is used to determine the likely secondary structure of the RNA sample. The model relies on two types of features, namely numerical and categorical features, where the categorical features are extracted from the mRNA sequences, structure and predicted loop. These features are represented and encoded by numbers, and then, the features are extracted using embedding layer learning. There are five numerical features depending on the likelihood for each pair of nucleotides in the RNA. The model gives promising results because it predicts the five reactivity values with a validation mean columnwise root mean square error (MCRMSE) of 0.125 using LSTM model with augmentation and the codon encoding method. Codon encoding outperforms Base encoding in MCRMSE validation error using the LSTM model meanwhile Base encoding outperforms codon encoding due to less over-fitting and the difference between the training and validation loss error is 0.008. PeerJ Inc. 2021-06-22 /pmc/articles/PMC8237341/ /pubmed/34239977 http://dx.doi.org/10.7717/peerj-cs.597 Text en © 2021 Qaid et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Qaid, Talal S.
Mazaar, Hussein
Alqahtani, Mohammed S.
Raweh, Abeer A.
Alakwaa, Wafaa
Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation
title Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation
title_full Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation
title_fullStr Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation
title_full_unstemmed Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation
title_short Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation
title_sort deep sequence modelling for predicting covid-19 mrna vaccine degradation
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237341/
https://www.ncbi.nlm.nih.gov/pubmed/34239977
http://dx.doi.org/10.7717/peerj-cs.597
work_keys_str_mv AT qaidtalals deepsequencemodellingforpredictingcovid19mrnavaccinedegradation
AT mazaarhussein deepsequencemodellingforpredictingcovid19mrnavaccinedegradation
AT alqahtanimohammeds deepsequencemodellingforpredictingcovid19mrnavaccinedegradation
AT rawehabeera deepsequencemodellingforpredictingcovid19mrnavaccinedegradation
AT alakwaawafaa deepsequencemodellingforpredictingcovid19mrnavaccinedegradation