Cargando…
Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?
MOTIVATION: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implication...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Ltd.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188262/ https://www.ncbi.nlm.nih.gov/pubmed/35700770 http://dx.doi.org/10.1016/j.jmb.2022.167684 |
_version_ | 1784725336039096320 |
---|---|
author | Nagpal, Sunil Pinna, Nishal Kumar Pant, Namrata Singh, Rohan Srivastava, Divyanshu Mande, Sharmila S. |
author_facet | Nagpal, Sunil Pinna, Nishal Kumar Pant, Namrata Singh, Rohan Srivastava, Divyanshu Mande, Sharmila S. |
author_sort | Nagpal, Sunil |
collection | PubMed |
description | MOTIVATION: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any). RESULTS: We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199,519 outcome-traced genomes, representing 45,625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07, S.America:0.93 ± 03). Although virus-genotype alone could enable high predictivity (0.97 ± 0.01, 0.89 ± 0.02, 0.86 ± 0.04, 0.95 ± 0.06, 0.9 ± 0.04), the performance was not found to be consistent and the models for a few geographies displayed significant improvement in predictivity when the influence of age and/or epitope was incorporated with virus-genotype (Wilcoxon p_BH < 0.05). Neither age or epitopic-influence or clade information could out-perform the integrated features. A sparse model (6 features), developed using patient-age and epitopic-influence of the mutations, performed reasonably well (>0.87 ± 0.03, 0.91 ± 0.01, 0.87 ± 0.03, 0.84 ± 0.08, 0.89 ± 0.05). High-performance models were employed for inferring the important mutations-of-interest using Shapley Additive exPlanations (SHAP). The changes in HLA interactions of the mutated epitopes of reference SARS-CoV-2 were then subsequently probed. Notably, we also describe the significance of a ‘temporal-modeling approach’ to benchmark the models linked with continuously evolving pathogens. We conclude that while machine learning can play a vital role in identifying relevant mutations and factors driving the severity, caution should be exercised in using the genotypic signatures for predictive prognosis. |
format | Online Article Text |
id | pubmed-9188262 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier Ltd. |
record_format | MEDLINE/PubMed |
spelling | pubmed-91882622022-06-13 Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? Nagpal, Sunil Pinna, Nishal Kumar Pant, Namrata Singh, Rohan Srivastava, Divyanshu Mande, Sharmila S. J Mol Biol Research Article MOTIVATION: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any). RESULTS: We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199,519 outcome-traced genomes, representing 45,625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07, S.America:0.93 ± 03). Although virus-genotype alone could enable high predictivity (0.97 ± 0.01, 0.89 ± 0.02, 0.86 ± 0.04, 0.95 ± 0.06, 0.9 ± 0.04), the performance was not found to be consistent and the models for a few geographies displayed significant improvement in predictivity when the influence of age and/or epitope was incorporated with virus-genotype (Wilcoxon p_BH < 0.05). Neither age or epitopic-influence or clade information could out-perform the integrated features. A sparse model (6 features), developed using patient-age and epitopic-influence of the mutations, performed reasonably well (>0.87 ± 0.03, 0.91 ± 0.01, 0.87 ± 0.03, 0.84 ± 0.08, 0.89 ± 0.05). High-performance models were employed for inferring the important mutations-of-interest using Shapley Additive exPlanations (SHAP). The changes in HLA interactions of the mutated epitopes of reference SARS-CoV-2 were then subsequently probed. Notably, we also describe the significance of a ‘temporal-modeling approach’ to benchmark the models linked with continuously evolving pathogens. We conclude that while machine learning can play a vital role in identifying relevant mutations and factors driving the severity, caution should be exercised in using the genotypic signatures for predictive prognosis. Elsevier Ltd. 2022-08-15 2022-06-11 /pmc/articles/PMC9188262/ /pubmed/35700770 http://dx.doi.org/10.1016/j.jmb.2022.167684 Text en © 2022 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Research Article Nagpal, Sunil Pinna, Nishal Kumar Pant, Namrata Singh, Rohan Srivastava, Divyanshu Mande, Sharmila S. Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? |
title | Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? |
title_full | Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? |
title_fullStr | Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? |
title_full_unstemmed | Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? |
title_short | Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? |
title_sort | can machines learn the mutation signatures of sars-cov-2 and enable viral-genotype guided predictive prognosis? |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188262/ https://www.ncbi.nlm.nih.gov/pubmed/35700770 http://dx.doi.org/10.1016/j.jmb.2022.167684 |
work_keys_str_mv | AT nagpalsunil canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis AT pinnanishalkumar canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis AT pantnamrata canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis AT singhrohan canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis AT srivastavadivyanshu canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis AT mandesharmilas canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis |