Cargando…
SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecoviru...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8113528/ https://www.ncbi.nlm.nih.gov/pubmed/33976134 http://dx.doi.org/10.1038/s41467-021-22905-7 |
_version_ | 1783690880364838912 |
---|---|
author | Jungreis, Irwin Sealfon, Rachel Kellis, Manolis |
author_facet | Jungreis, Irwin Sealfon, Rachel Kellis, Manolis |
author_sort | Jungreis, Irwin |
collection | PubMed |
description | Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecovirus genomes at ideally-suited evolutionary distances, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for ORFs 3a, 6, 7a, 7b, 8, 9b, and a novel alternate-frame gene, ORF3c, whereas ORFs 2b, 3d/3d-2, 3b, 9c, and 10 lack protein-coding signatures or convincing experimental evidence of protein-coding function. Furthermore, we show no other conserved protein-coding genes remain to be discovered. Mutation analysis suggests ORF8 contributes to within-individual fitness but not person-to-person transmission. Cross-strain and within-strain evolutionary pressures agree, except for fewer-than-expected within-strain mutations in nsp3 and S1, and more-than-expected in nucleocapsid, which shows a cluster of mutations in a predicted B-cell epitope, suggesting immune-avoidance selection. Evolutionary histories of residues disrupted by spike-protein substitutions D614G, N501Y, E484K, and K417N/T provide clues about their biology, and we catalog likely-functional co-inherited mutations. Previously reported RNA-modification sites show no enrichment for conservation. Here we report a high-confidence gene set and evolutionary-history annotations providing valuable resources and insights on SARS-CoV-2 biology, mutations, and evolution. |
format | Online Article Text |
id | pubmed-8113528 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-81135282021-05-14 SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes Jungreis, Irwin Sealfon, Rachel Kellis, Manolis Nat Commun Article Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecovirus genomes at ideally-suited evolutionary distances, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for ORFs 3a, 6, 7a, 7b, 8, 9b, and a novel alternate-frame gene, ORF3c, whereas ORFs 2b, 3d/3d-2, 3b, 9c, and 10 lack protein-coding signatures or convincing experimental evidence of protein-coding function. Furthermore, we show no other conserved protein-coding genes remain to be discovered. Mutation analysis suggests ORF8 contributes to within-individual fitness but not person-to-person transmission. Cross-strain and within-strain evolutionary pressures agree, except for fewer-than-expected within-strain mutations in nsp3 and S1, and more-than-expected in nucleocapsid, which shows a cluster of mutations in a predicted B-cell epitope, suggesting immune-avoidance selection. Evolutionary histories of residues disrupted by spike-protein substitutions D614G, N501Y, E484K, and K417N/T provide clues about their biology, and we catalog likely-functional co-inherited mutations. Previously reported RNA-modification sites show no enrichment for conservation. Here we report a high-confidence gene set and evolutionary-history annotations providing valuable resources and insights on SARS-CoV-2 biology, mutations, and evolution. Nature Publishing Group UK 2021-05-11 /pmc/articles/PMC8113528/ /pubmed/33976134 http://dx.doi.org/10.1038/s41467-021-22905-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Jungreis, Irwin Sealfon, Rachel Kellis, Manolis SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_full | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_fullStr | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_full_unstemmed | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_short | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_sort | sars-cov-2 gene content and covid-19 mutation impact by comparing 44 sarbecovirus genomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8113528/ https://www.ncbi.nlm.nih.gov/pubmed/33976134 http://dx.doi.org/10.1038/s41467-021-22905-7 |
work_keys_str_mv | AT jungreisirwin sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes AT sealfonrachel sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes AT kellismanolis sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes |