Cargando…

Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method

SARS-CoV-2 shows great evolutionary capacity through a high frequency of genomic variation during transmission. Evolved SARS-CoV-2 often demonstrates resistance to previous vaccines and can cause poor clinical status in patients. Mutations in the SARS-CoV-2 genome involve mutations in structural and...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Feiming, Chen, Lei, Guo, Wei, Zhou, Xianchao, Feng, Kaiyan, Huang, Tao, Cai, Yudong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9225528/
https://www.ncbi.nlm.nih.gov/pubmed/35743837
http://dx.doi.org/10.3390/life12060806
_version_ 1784733634653061120
author Huang, Feiming
Chen, Lei
Guo, Wei
Zhou, Xianchao
Feng, Kaiyan
Huang, Tao
Cai, Yudong
author_facet Huang, Feiming
Chen, Lei
Guo, Wei
Zhou, Xianchao
Feng, Kaiyan
Huang, Tao
Cai, Yudong
author_sort Huang, Feiming
collection PubMed
description SARS-CoV-2 shows great evolutionary capacity through a high frequency of genomic variation during transmission. Evolved SARS-CoV-2 often demonstrates resistance to previous vaccines and can cause poor clinical status in patients. Mutations in the SARS-CoV-2 genome involve mutations in structural and nonstructural proteins, and some of these proteins such as spike proteins have been shown to be directly associated with the clinical status of patients with severe COVID-19 pneumonia. In this study, we collected genome-wide mutation information of virulent strains and the severity of COVID-19 pneumonia in patients varying depending on their clinical status. Important protein mutations and untranslated region mutations were extracted using machine learning methods. First, through Boruta and four ranking algorithms (least absolute shrinkage and selection operator, light gradient boosting machine, max-relevance and min-redundancy, and Monte Carlo feature selection), mutations that were highly correlated with the clinical status of the patients were screened out and sorted in four feature lists. Some mutations such as D614G and V1176F were shown to be associated with viral infectivity. Moreover, previously unreported mutations such as A320V of nsp14 and I164ILV of nsp14 were also identified, which suggests their potential roles. We then applied the incremental feature selection method to each feature list to construct efficient classifiers, which can be directly used to distinguish the clinical status of COVID-19 patients. Meanwhile, four sets of quantitative rules were set up, which can help us to more intuitively understand the role of each mutation in differentiating the clinical status of COVID-19 patients. Identified key mutations linked to virologic properties will help better understand the mechanisms of infection and will aid in the development of antiviral treatments.
format Online
Article
Text
id pubmed-9225528
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-92255282022-06-24 Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method Huang, Feiming Chen, Lei Guo, Wei Zhou, Xianchao Feng, Kaiyan Huang, Tao Cai, Yudong Life (Basel) Article SARS-CoV-2 shows great evolutionary capacity through a high frequency of genomic variation during transmission. Evolved SARS-CoV-2 often demonstrates resistance to previous vaccines and can cause poor clinical status in patients. Mutations in the SARS-CoV-2 genome involve mutations in structural and nonstructural proteins, and some of these proteins such as spike proteins have been shown to be directly associated with the clinical status of patients with severe COVID-19 pneumonia. In this study, we collected genome-wide mutation information of virulent strains and the severity of COVID-19 pneumonia in patients varying depending on their clinical status. Important protein mutations and untranslated region mutations were extracted using machine learning methods. First, through Boruta and four ranking algorithms (least absolute shrinkage and selection operator, light gradient boosting machine, max-relevance and min-redundancy, and Monte Carlo feature selection), mutations that were highly correlated with the clinical status of the patients were screened out and sorted in four feature lists. Some mutations such as D614G and V1176F were shown to be associated with viral infectivity. Moreover, previously unreported mutations such as A320V of nsp14 and I164ILV of nsp14 were also identified, which suggests their potential roles. We then applied the incremental feature selection method to each feature list to construct efficient classifiers, which can be directly used to distinguish the clinical status of COVID-19 patients. Meanwhile, four sets of quantitative rules were set up, which can help us to more intuitively understand the role of each mutation in differentiating the clinical status of COVID-19 patients. Identified key mutations linked to virologic properties will help better understand the mechanisms of infection and will aid in the development of antiviral treatments. MDPI 2022-05-28 /pmc/articles/PMC9225528/ /pubmed/35743837 http://dx.doi.org/10.3390/life12060806 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Huang, Feiming
Chen, Lei
Guo, Wei
Zhou, Xianchao
Feng, Kaiyan
Huang, Tao
Cai, Yudong
Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method
title Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method
title_full Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method
title_fullStr Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method
title_full_unstemmed Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method
title_short Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method
title_sort identifying covid-19 severity-related sars-cov-2 mutation using a machine learning method
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9225528/
https://www.ncbi.nlm.nih.gov/pubmed/35743837
http://dx.doi.org/10.3390/life12060806
work_keys_str_mv AT huangfeiming identifyingcovid19severityrelatedsarscov2mutationusingamachinelearningmethod
AT chenlei identifyingcovid19severityrelatedsarscov2mutationusingamachinelearningmethod
AT guowei identifyingcovid19severityrelatedsarscov2mutationusingamachinelearningmethod
AT zhouxianchao identifyingcovid19severityrelatedsarscov2mutationusingamachinelearningmethod
AT fengkaiyan identifyingcovid19severityrelatedsarscov2mutationusingamachinelearningmethod
AT huangtao identifyingcovid19severityrelatedsarscov2mutationusingamachinelearningmethod
AT caiyudong identifyingcovid19severityrelatedsarscov2mutationusingamachinelearningmethod