Cargando…
SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and pri...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Journal Experts
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536840/ https://www.ncbi.nlm.nih.gov/pubmed/33024961 http://dx.doi.org/10.21203/rs.3.rs-80345/v1 |
_version_ | 1783590625827880960 |
---|---|
author | Jungreis, Irwin Sealfon, Rachel Kellis, Manolis |
author_facet | Jungreis, Irwin Sealfon, Rachel Kellis, Manolis |
author_sort | Jungreis, Irwin |
collection | PubMed |
description | Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution. |
format | Online Article Text |
id | pubmed-7536840 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | American Journal Experts |
record_format | MEDLINE/PubMed |
spelling | pubmed-75368402020-10-07 SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes Jungreis, Irwin Sealfon, Rachel Kellis, Manolis Res Sq Article Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution. American Journal Experts 2020-10-01 /pmc/articles/PMC7536840/ /pubmed/33024961 http://dx.doi.org/10.21203/rs.3.rs-80345/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Jungreis, Irwin Sealfon, Rachel Kellis, Manolis SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_full | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_fullStr | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_full_unstemmed | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_short | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_sort | sars-cov-2 gene content and covid-19 mutation impact by comparing 44 sarbecovirus genomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536840/ https://www.ncbi.nlm.nih.gov/pubmed/33024961 http://dx.doi.org/10.21203/rs.3.rs-80345/v1 |
work_keys_str_mv | AT jungreisirwin sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes AT sealfonrachel sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes AT kellismanolis sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes |