Cargando…
Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning
The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARS-CoV, and ot...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598947/ https://www.ncbi.nlm.nih.gov/pubmed/34812427 http://dx.doi.org/10.1016/j.patter.2021.100407 |
_version_ | 1784600838174408704 |
---|---|
author | Park, Jonathan J. Chen, Sidi |
author_facet | Park, Jonathan J. Chen, Sidi |
author_sort | Park, Jonathan J. |
collection | PubMed |
description | The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARS-CoV, and other coronaviruses of human and animal origins to return quantitative and biologically interpretable signatures at nucleotide and amino acid resolutions. We identified hotspots across the SARS-CoV-2 genome, including previously unappreciated features in spike, RdRp, and other proteins. Finally, we integrated pathogenicity genomic profiles with B cell and T cell epitope predictions for enrichment of sequence targets to help guide vaccine development. These results provide a systematic map of predicted pathogenicity in SARS-CoV-2 that incorporates sequence, structural, and immunologic features, providing an unbiased collection of genetic elements for functional studies. This metavirome-based framework can also be applied for rapid characterization of new coronavirus strains or emerging pathogenic viruses. |
format | Online Article Text |
id | pubmed-8598947 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-85989472021-11-18 Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning Park, Jonathan J. Chen, Sidi Patterns (N Y) Article The COVID-19 pandemic caused by SARS-CoV-2 has become a major threat across the globe. Here, we developed machine learning approaches to identify key pathogenic regions in coronavirus genomes. We trained and evaluated 7,562,625 models on 3,665 genomes including SARS-CoV-2, MERS-CoV, SARS-CoV, and other coronaviruses of human and animal origins to return quantitative and biologically interpretable signatures at nucleotide and amino acid resolutions. We identified hotspots across the SARS-CoV-2 genome, including previously unappreciated features in spike, RdRp, and other proteins. Finally, we integrated pathogenicity genomic profiles with B cell and T cell epitope predictions for enrichment of sequence targets to help guide vaccine development. These results provide a systematic map of predicted pathogenicity in SARS-CoV-2 that incorporates sequence, structural, and immunologic features, providing an unbiased collection of genetic elements for functional studies. This metavirome-based framework can also be applied for rapid characterization of new coronavirus strains or emerging pathogenic viruses. Elsevier 2021-11-18 /pmc/articles/PMC8598947/ /pubmed/34812427 http://dx.doi.org/10.1016/j.patter.2021.100407 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Park, Jonathan J. Chen, Sidi Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning |
title | Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning |
title_full | Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning |
title_fullStr | Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning |
title_full_unstemmed | Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning |
title_short | Metaviromic identification of discriminative genomic features in SARS-CoV-2 using machine learning |
title_sort | metaviromic identification of discriminative genomic features in sars-cov-2 using machine learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598947/ https://www.ncbi.nlm.nih.gov/pubmed/34812427 http://dx.doi.org/10.1016/j.patter.2021.100407 |
work_keys_str_mv | AT parkjonathanj metaviromicidentificationofdiscriminativegenomicfeaturesinsarscov2usingmachinelearning AT chensidi metaviromicidentificationofdiscriminativegenomicfeaturesinsarscov2usingmachinelearning |