Cargando…
SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and pri...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302193/ https://www.ncbi.nlm.nih.gov/pubmed/32577641 http://dx.doi.org/10.1101/2020.06.02.130955 |
_version_ | 1783547800413274112 |
---|---|
author | Jungreis, Irwin Sealfon, Rachel Kellis, Manolis |
author_facet | Jungreis, Irwin Sealfon, Rachel Kellis, Manolis |
author_sort | Jungreis, Irwin |
collection | PubMed |
description | Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution. |
format | Online Article Text |
id | pubmed-7302193 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-73021932020-06-23 SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes Jungreis, Irwin Sealfon, Rachel Kellis, Manolis bioRxiv Article Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution. Cold Spring Harbor Laboratory 2020-09-02 /pmc/articles/PMC7302193/ /pubmed/32577641 http://dx.doi.org/10.1101/2020.06.02.130955 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/It is made available under a CC-BY 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Article Jungreis, Irwin Sealfon, Rachel Kellis, Manolis SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_full | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_fullStr | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_full_unstemmed | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_short | SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes |
title_sort | sars-cov-2 gene content and covid-19 mutation impact by comparing 44 sarbecovirus genomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302193/ https://www.ncbi.nlm.nih.gov/pubmed/32577641 http://dx.doi.org/10.1101/2020.06.02.130955 |
work_keys_str_mv | AT jungreisirwin sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes AT sealfonrachel sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes AT kellismanolis sarscov2genecontentandcovid19mutationimpactbycomparing44sarbecovirusgenomes |