Cargando…

Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?

MOTIVATION: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implication...

Descripción completa

Detalles Bibliográficos
Autores principales: Nagpal, Sunil, Pinna, Nishal Kumar, Pant, Namrata, Singh, Rohan, Srivastava, Divyanshu, Mande, Sharmila S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188262/
https://www.ncbi.nlm.nih.gov/pubmed/35700770
http://dx.doi.org/10.1016/j.jmb.2022.167684
_version_ 1784725336039096320
author Nagpal, Sunil
Pinna, Nishal Kumar
Pant, Namrata
Singh, Rohan
Srivastava, Divyanshu
Mande, Sharmila S.
author_facet Nagpal, Sunil
Pinna, Nishal Kumar
Pant, Namrata
Singh, Rohan
Srivastava, Divyanshu
Mande, Sharmila S.
author_sort Nagpal, Sunil
collection PubMed
description MOTIVATION: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any). RESULTS: We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199,519 outcome-traced genomes, representing 45,625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07, S.America:0.93 ± 03). Although virus-genotype alone could enable high predictivity (0.97 ± 0.01, 0.89 ± 0.02, 0.86 ± 0.04, 0.95 ± 0.06, 0.9 ± 0.04), the performance was not found to be consistent and the models for a few geographies displayed significant improvement in predictivity when the influence of age and/or epitope was incorporated with virus-genotype (Wilcoxon p_BH < 0.05). Neither age or epitopic-influence or clade information could out-perform the integrated features. A sparse model (6 features), developed using patient-age and epitopic-influence of the mutations, performed reasonably well (>0.87 ± 0.03, 0.91 ± 0.01, 0.87 ± 0.03, 0.84 ± 0.08, 0.89 ± 0.05). High-performance models were employed for inferring the important mutations-of-interest using Shapley Additive exPlanations (SHAP). The changes in HLA interactions of the mutated epitopes of reference SARS-CoV-2 were then subsequently probed. Notably, we also describe the significance of a ‘temporal-modeling approach’ to benchmark the models linked with continuously evolving pathogens. We conclude that while machine learning can play a vital role in identifying relevant mutations and factors driving the severity, caution should be exercised in using the genotypic signatures for predictive prognosis.
format Online
Article
Text
id pubmed-9188262
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-91882622022-06-13 Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis? Nagpal, Sunil Pinna, Nishal Kumar Pant, Namrata Singh, Rohan Srivastava, Divyanshu Mande, Sharmila S. J Mol Biol Research Article MOTIVATION: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any). RESULTS: We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199,519 outcome-traced genomes, representing 45,625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07, S.America:0.93 ± 03). Although virus-genotype alone could enable high predictivity (0.97 ± 0.01, 0.89 ± 0.02, 0.86 ± 0.04, 0.95 ± 0.06, 0.9 ± 0.04), the performance was not found to be consistent and the models for a few geographies displayed significant improvement in predictivity when the influence of age and/or epitope was incorporated with virus-genotype (Wilcoxon p_BH < 0.05). Neither age or epitopic-influence or clade information could out-perform the integrated features. A sparse model (6 features), developed using patient-age and epitopic-influence of the mutations, performed reasonably well (>0.87 ± 0.03, 0.91 ± 0.01, 0.87 ± 0.03, 0.84 ± 0.08, 0.89 ± 0.05). High-performance models were employed for inferring the important mutations-of-interest using Shapley Additive exPlanations (SHAP). The changes in HLA interactions of the mutated epitopes of reference SARS-CoV-2 were then subsequently probed. Notably, we also describe the significance of a ‘temporal-modeling approach’ to benchmark the models linked with continuously evolving pathogens. We conclude that while machine learning can play a vital role in identifying relevant mutations and factors driving the severity, caution should be exercised in using the genotypic signatures for predictive prognosis. Elsevier Ltd. 2022-08-15 2022-06-11 /pmc/articles/PMC9188262/ /pubmed/35700770 http://dx.doi.org/10.1016/j.jmb.2022.167684 Text en © 2022 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Research Article
Nagpal, Sunil
Pinna, Nishal Kumar
Pant, Namrata
Singh, Rohan
Srivastava, Divyanshu
Mande, Sharmila S.
Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?
title Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?
title_full Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?
title_fullStr Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?
title_full_unstemmed Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?
title_short Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?
title_sort can machines learn the mutation signatures of sars-cov-2 and enable viral-genotype guided predictive prognosis?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188262/
https://www.ncbi.nlm.nih.gov/pubmed/35700770
http://dx.doi.org/10.1016/j.jmb.2022.167684
work_keys_str_mv AT nagpalsunil canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis
AT pinnanishalkumar canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis
AT pantnamrata canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis
AT singhrohan canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis
AT srivastavadivyanshu canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis
AT mandesharmilas canmachineslearnthemutationsignaturesofsarscov2andenableviralgenotypeguidedpredictiveprognosis