Cargando…

Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning

The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARS-CoV, and ot...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Jonathan J., Chen, Sidi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598947/
https://www.ncbi.nlm.nih.gov/pubmed/34812427
http://dx.doi.org/10.1016/j.patter.2021.100407
_version_ 1784600838174408704
author Park, Jonathan J.
Chen, Sidi
author_facet Park, Jonathan J.
Chen, Sidi
author_sort Park, Jonathan J.
collection PubMed
description The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARS-CoV, and other coronaviruses of human and animal origins to return quantitative and biologically interpretable signatures at nucleotide and amino acid resolutions. We identified hotspots across the SARS-CoV-2 genome, including previously unappreciated features in spike, RdRp, and other proteins. Finally, we integrated pathogenicity genomic profiles with B cell and T cell epitope predictions for enrichment of sequence targets to help guide vaccine development. These results provide a systematic map of predicted pathogenicity in SARS-CoV-2 that incorporates sequence, structural, and immunologic features, providing an unbiased collection of genetic elements for functional studies. This metavirome-based framework can also be applied for rapid characterization of new coronavirus strains or emerging pathogenic viruses.
format Online
Article
Text
id pubmed-8598947
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-85989472021-11-18 Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning Park, Jonathan J. Chen, Sidi Patterns (N Y) Article The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARS-CoV, and other coronaviruses of human and animal origins to return quantitative and biologically interpretable signatures at nucleotide and amino acid resolutions. We identified hotspots across the SARS-CoV-2 genome, including previously unappreciated features in spike, RdRp, and other proteins. Finally, we integrated pathogenicity genomic profiles with B cell and T cell epitope predictions for enrichment of sequence targets to help guide vaccine development. These results provide a systematic map of predicted pathogenicity in SARS-CoV-2 that incorporates sequence, structural, and immunologic features, providing an unbiased collection of genetic elements for functional studies. This metavirome-based framework can also be applied for rapid characterization of new coronavirus strains or emerging pathogenic viruses. Elsevier 2021-11-18 /pmc/articles/PMC8598947/ /pubmed/34812427 http://dx.doi.org/10.1016/j.patter.2021.100407 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Park, Jonathan J.
Chen, Sidi
Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning
title Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning
title_full Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning
title_fullStr Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning
title_full_unstemmed Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning
title_short Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning
title_sort metaviromic identification of discriminative genomic features in sars-cov-2 using machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598947/
https://www.ncbi.nlm.nih.gov/pubmed/34812427
http://dx.doi.org/10.1016/j.patter.2021.100407
work_keys_str_mv AT parkjonathanj metaviromicidentificationofdiscriminativegenomicfeaturesinsarscov2usingmachinelearning
AT chensidi metaviromicidentificationofdiscriminativegenomicfeaturesinsarscov2usingmachinelearning