Cargando…

Can machines learn the mutation signatures of SARS-CoV-2 and enable viral-genotype guided predictive prognosis?

MOTIVATION: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implication...

Descripción completa

Detalles Bibliográficos
Autores principales: Nagpal, Sunil, Pinna, Nishal Kumar, Pant, Namrata, Singh, Rohan, Srivastava, Divyanshu, Mande, Sharmila S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188262/
https://www.ncbi.nlm.nih.gov/pubmed/35700770
http://dx.doi.org/10.1016/j.jmb.2022.167684
Descripción
Sumario:MOTIVATION: Continuous emergence of new variants through appearance/accumulation/disappearance of mutations is a hallmark of many viral diseases. SARS-CoV-2 variants have particularly exerted tremendous pressure on global healthcare system owing to their life threatening and debilitating implications. The sheer plurality of variants and huge scale of genomic data have added to the challenges of tracing the mutations/variants and their relationship to infection severity (if any). RESULTS: We explored the suitability of virus-genotype guided machine-learning in infection prognosis and identification of features/mutations-of-interest. Total 199,519 outcome-traced genomes, representing 45,625 nucleotide-mutations, were employed. Among these, post data-cleaning, Low and High severity genomes were classified using an integrated model (employing virus genotype, epitopic-influence and patient-age) with consistently high ROC-AUC (Asia:0.97 ± 0.01, Europe:0.94 ± 0.01, N.America:0.92 ± 0.02, Africa:0.94 ± 0.07, S.America:0.93 ± 03). Although virus-genotype alone could enable high predictivity (0.97 ± 0.01, 0.89 ± 0.02, 0.86 ± 0.04, 0.95 ± 0.06, 0.9 ± 0.04), the performance was not found to be consistent and the models for a few geographies displayed significant improvement in predictivity when the influence of age and/or epitope was incorporated with virus-genotype (Wilcoxon p_BH < 0.05). Neither age or epitopic-influence or clade information could out-perform the integrated features. A sparse model (6 features), developed using patient-age and epitopic-influence of the mutations, performed reasonably well (>0.87 ± 0.03, 0.91 ± 0.01, 0.87 ± 0.03, 0.84 ± 0.08, 0.89 ± 0.05). High-performance models were employed for inferring the important mutations-of-interest using Shapley Additive exPlanations (SHAP). The changes in HLA interactions of the mutated epitopes of reference SARS-CoV-2 were then subsequently probed. Notably, we also describe the significance of a ‘temporal-modeling approach’ to benchmark the models linked with continuously evolving pathogens. We conclude that while machine learning can play a vital role in identifying relevant mutations and factors driving the severity, caution should be exercised in using the genotypic signatures for predictive prognosis.