Cargando…

SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes

Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and pri...

Descripción completa

Detalles Bibliográficos
Autores principales: Jungreis, Irwin, Sealfon, Rachel, Kellis, Manolis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Journal Experts 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536840/
https://www.ncbi.nlm.nih.gov/pubmed/33024961
http://dx.doi.org/10.21203/rs.3.rs-80345/v1
_version_ 1783590625827880960
author Jungreis, Irwin
Sealfon, Rachel
Kellis, Manolis
author_facet Jungreis, Irwin
Sealfon, Rachel
Kellis, Manolis
author_sort Jungreis, Irwin
collection PubMed
description Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution.
format Online
Article
Text
id pubmed-7536840
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Journal Experts
record_format MEDLINE/PubMed
spelling pubmed-75368402020-10-07 SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes Jungreis, Irwin Sealfon, Rachel Kellis, Manolis Res Sq Article Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution. American Journal Experts 2020-10-01 /pmc/articles/PMC7536840/ /pubmed/33024961 http://dx.doi.org/10.21203/rs.3.rs-80345/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Jungreis, Irwin
Sealfon, Rachel
Kellis, Manolis
SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
title SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
title_full SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
title_fullStr SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
title_full_unstemmed SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
title_short SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
title_sort sars-cov-2 gene content and covid-19 mutation impact by comparing 44 sarbecovirus genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536840/
https://www.ncbi.nlm.nih.gov/pubmed/33024961
http://dx.doi.org/10.21203/rs.3.rs-80345/v1
work_keys_str_mv AT jungreisirwin sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes
AT sealfonrachel sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes
AT kellismanolis sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes