Cargando…
Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation
The worldwide coronavirus (COVID-19) pandemic made dramatic and rapid progress in the year 2020 and requires urgent global effort to accelerate the development of a vaccine to stop the daily infections and deaths. Several types of vaccine have been designed to teach the immune system how to fight of...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237341/ https://www.ncbi.nlm.nih.gov/pubmed/34239977 http://dx.doi.org/10.7717/peerj-cs.597 |
_version_ | 1783714710391095296 |
---|---|
author | Qaid, Talal S. Mazaar, Hussein Alqahtani, Mohammed S. Raweh, Abeer A. Alakwaa, Wafaa |
author_facet | Qaid, Talal S. Mazaar, Hussein Alqahtani, Mohammed S. Raweh, Abeer A. Alakwaa, Wafaa |
author_sort | Qaid, Talal S. |
collection | PubMed |
description | The worldwide coronavirus (COVID-19) pandemic made dramatic and rapid progress in the year 2020 and requires urgent global effort to accelerate the development of a vaccine to stop the daily infections and deaths. Several types of vaccine have been designed to teach the immune system how to fight off certain kinds of pathogens. mRNA vaccines are the most important candidate vaccines because of their capacity for rapid development, high potency, safe administration and potential for low-cost manufacture. mRNA vaccine acts by training the body to recognize and response to the proteins produced by disease-causing organisms such as viruses or bacteria. This type of vaccine is the fastest candidate to treat COVID-19 but it currently facing several limitations. In particular, it is a challenge to design stable mRNA molecules because of the inefficient in vivo delivery of mRNA, its tendency for spontaneous degradation and low protein expression levels. This work designed and implemented a sequence deep model based on bidirectional GRU and LSTM models applied on the Stanford COVID-19 mRNA vaccine dataset to predict the mRNA sequences responsible for degradation by predicting five reactivity values for every position in the sequence. Four of these values determine the likelihood of degradation with/without magnesium at high pH (pH 10) and high temperature (50 degrees Celsius) and the fifth reactivity value is used to determine the likely secondary structure of the RNA sample. The model relies on two types of features, namely numerical and categorical features, where the categorical features are extracted from the mRNA sequences, structure and predicted loop. These features are represented and encoded by numbers, and then, the features are extracted using embedding layer learning. There are five numerical features depending on the likelihood for each pair of nucleotides in the RNA. The model gives promising results because it predicts the five reactivity values with a validation mean columnwise root mean square error (MCRMSE) of 0.125 using LSTM model with augmentation and the codon encoding method. Codon encoding outperforms Base encoding in MCRMSE validation error using the LSTM model meanwhile Base encoding outperforms codon encoding due to less over-fitting and the difference between the training and validation loss error is 0.008. |
format | Online Article Text |
id | pubmed-8237341 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-82373412021-07-07 Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation Qaid, Talal S. Mazaar, Hussein Alqahtani, Mohammed S. Raweh, Abeer A. Alakwaa, Wafaa PeerJ Comput Sci Bioinformatics The worldwide coronavirus (COVID-19) pandemic made dramatic and rapid progress in the year 2020 and requires urgent global effort to accelerate the development of a vaccine to stop the daily infections and deaths. Several types of vaccine have been designed to teach the immune system how to fight off certain kinds of pathogens. mRNA vaccines are the most important candidate vaccines because of their capacity for rapid development, high potency, safe administration and potential for low-cost manufacture. mRNA vaccine acts by training the body to recognize and response to the proteins produced by disease-causing organisms such as viruses or bacteria. This type of vaccine is the fastest candidate to treat COVID-19 but it currently facing several limitations. In particular, it is a challenge to design stable mRNA molecules because of the inefficient in vivo delivery of mRNA, its tendency for spontaneous degradation and low protein expression levels. This work designed and implemented a sequence deep model based on bidirectional GRU and LSTM models applied on the Stanford COVID-19 mRNA vaccine dataset to predict the mRNA sequences responsible for degradation by predicting five reactivity values for every position in the sequence. Four of these values determine the likelihood of degradation with/without magnesium at high pH (pH 10) and high temperature (50 degrees Celsius) and the fifth reactivity value is used to determine the likely secondary structure of the RNA sample. The model relies on two types of features, namely numerical and categorical features, where the categorical features are extracted from the mRNA sequences, structure and predicted loop. These features are represented and encoded by numbers, and then, the features are extracted using embedding layer learning. There are five numerical features depending on the likelihood for each pair of nucleotides in the RNA. The model gives promising results because it predicts the five reactivity values with a validation mean columnwise root mean square error (MCRMSE) of 0.125 using LSTM model with augmentation and the codon encoding method. Codon encoding outperforms Base encoding in MCRMSE validation error using the LSTM model meanwhile Base encoding outperforms codon encoding due to less over-fitting and the difference between the training and validation loss error is 0.008. PeerJ Inc. 2021-06-22 /pmc/articles/PMC8237341/ /pubmed/34239977 http://dx.doi.org/10.7717/peerj-cs.597 Text en © 2021 Qaid et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Qaid, Talal S. Mazaar, Hussein Alqahtani, Mohammed S. Raweh, Abeer A. Alakwaa, Wafaa Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation |
title | Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation |
title_full | Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation |
title_fullStr | Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation |
title_full_unstemmed | Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation |
title_short | Deep sequence modelling for predicting COVID-19 mRNA vaccine degradation |
title_sort | deep sequence modelling for predicting covid-19 mrna vaccine degradation |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237341/ https://www.ncbi.nlm.nih.gov/pubmed/34239977 http://dx.doi.org/10.7717/peerj-cs.597 |
work_keys_str_mv | AT qaidtalals deepsequencemodellingforpredictingcovid19mrnavaccinedegradation AT mazaarhussein deepsequencemodellingforpredictingcovid19mrnavaccinedegradation AT alqahtanimohammeds deepsequencemodellingforpredictingcovid19mrnavaccinedegradation AT rawehabeera deepsequencemodellingforpredictingcovid19mrnavaccinedegradation AT alakwaawafaa deepsequencemodellingforpredictingcovid19mrnavaccinedegradation |