Cargando…

A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity

An unprecedented amount of SARS-CoV-2 sequencing has been performed, however, novel bioinformatic tools to cope with and process these large datasets is needed. Here, we have devised a bioinformatic pipeline that inputs SARS-CoV-2 genome sequencing in FASTA/FASTQ format and outputs a single Variant...

Descripción completa

Detalles Bibliográficos
Autores principales: Farkas, Carlos, Mella, Andy, Turgeon, Maxime, Haigh, Jody J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8256173/
https://www.ncbi.nlm.nih.gov/pubmed/34234758
http://dx.doi.org/10.3389/fmicb.2021.665041
_version_ 1783718047271354368
author Farkas, Carlos
Mella, Andy
Turgeon, Maxime
Haigh, Jody J.
author_facet Farkas, Carlos
Mella, Andy
Turgeon, Maxime
Haigh, Jody J.
author_sort Farkas, Carlos
collection PubMed
description An unprecedented amount of SARS-CoV-2 sequencing has been performed, however, novel bioinformatic tools to cope with and process these large datasets is needed. Here, we have devised a bioinformatic pipeline that inputs SARS-CoV-2 genome sequencing in FASTA/FASTQ format and outputs a single Variant Calling Format file that can be processed to obtain variant annotations and perform downstream population genetic testing. As proof of concept, we have analyzed over 229,000 SARS-CoV-2 viral sequences up until November 30, 2020. We have identified over 39,000 variants worldwide with increased polymorphisms, spanning the ORF3a gene as well as the 3′ untranslated (UTR) regions, specifically in the conserved stem loop region of SARS-CoV-2 which is accumulating greater observed viral diversity relative to chance variation. Our analysis pipeline has also discovered the existence of SARS-CoV-2 hypermutation with low frequency (less than in 2% of genomes) likely arising through host immune responses and not due to sequencing errors. Among annotated non-sense variants with a population frequency over 1%, recurrent inactivation of the ORF8 gene was found. This was found to be present in the newly identified B.1.1.7 SARS-CoV-2 lineage that originated in the United Kingdom. Almost all VOC-containing genomes possess one stop codon in ORF8 gene (Q27(∗)), however, 13% of these genomes also contains another stop codon (K68(∗)), suggesting that ORF8 loss does not interfere with SARS-CoV-2 spread and may play a role in its increased virulence. We have developed this computational pipeline to assist researchers in the rapid analysis and characterization of SARS-CoV-2 variation.
format Online
Article
Text
id pubmed-8256173
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82561732021-07-06 A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity Farkas, Carlos Mella, Andy Turgeon, Maxime Haigh, Jody J. Front Microbiol Microbiology An unprecedented amount of SARS-CoV-2 sequencing has been performed, however, novel bioinformatic tools to cope with and process these large datasets is needed. Here, we have devised a bioinformatic pipeline that inputs SARS-CoV-2 genome sequencing in FASTA/FASTQ format and outputs a single Variant Calling Format file that can be processed to obtain variant annotations and perform downstream population genetic testing. As proof of concept, we have analyzed over 229,000 SARS-CoV-2 viral sequences up until November 30, 2020. We have identified over 39,000 variants worldwide with increased polymorphisms, spanning the ORF3a gene as well as the 3′ untranslated (UTR) regions, specifically in the conserved stem loop region of SARS-CoV-2 which is accumulating greater observed viral diversity relative to chance variation. Our analysis pipeline has also discovered the existence of SARS-CoV-2 hypermutation with low frequency (less than in 2% of genomes) likely arising through host immune responses and not due to sequencing errors. Among annotated non-sense variants with a population frequency over 1%, recurrent inactivation of the ORF8 gene was found. This was found to be present in the newly identified B.1.1.7 SARS-CoV-2 lineage that originated in the United Kingdom. Almost all VOC-containing genomes possess one stop codon in ORF8 gene (Q27(∗)), however, 13% of these genomes also contains another stop codon (K68(∗)), suggesting that ORF8 loss does not interfere with SARS-CoV-2 spread and may play a role in its increased virulence. We have developed this computational pipeline to assist researchers in the rapid analysis and characterization of SARS-CoV-2 variation. Frontiers Media S.A. 2021-06-21 /pmc/articles/PMC8256173/ /pubmed/34234758 http://dx.doi.org/10.3389/fmicb.2021.665041 Text en Copyright © 2021 Farkas, Mella, Turgeon and Haigh. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Farkas, Carlos
Mella, Andy
Turgeon, Maxime
Haigh, Jody J.
A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity
title A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity
title_full A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity
title_fullStr A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity
title_full_unstemmed A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity
title_short A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity
title_sort novel sars-cov-2 viral sequence bioinformatic pipeline has found genetic evidence that the viral 3′ untranslated region (utr) is evolving and generating increased viral diversity
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8256173/
https://www.ncbi.nlm.nih.gov/pubmed/34234758
http://dx.doi.org/10.3389/fmicb.2021.665041
work_keys_str_mv AT farkascarlos anovelsarscov2viralsequencebioinformaticpipelinehasfoundgeneticevidencethattheviral3untranslatedregionutrisevolvingandgeneratingincreasedviraldiversity
AT mellaandy anovelsarscov2viralsequencebioinformaticpipelinehasfoundgeneticevidencethattheviral3untranslatedregionutrisevolvingandgeneratingincreasedviraldiversity
AT turgeonmaxime anovelsarscov2viralsequencebioinformaticpipelinehasfoundgeneticevidencethattheviral3untranslatedregionutrisevolvingandgeneratingincreasedviraldiversity
AT haighjodyj anovelsarscov2viralsequencebioinformaticpipelinehasfoundgeneticevidencethattheviral3untranslatedregionutrisevolvingandgeneratingincreasedviraldiversity
AT farkascarlos novelsarscov2viralsequencebioinformaticpipelinehasfoundgeneticevidencethattheviral3untranslatedregionutrisevolvingandgeneratingincreasedviraldiversity
AT mellaandy novelsarscov2viralsequencebioinformaticpipelinehasfoundgeneticevidencethattheviral3untranslatedregionutrisevolvingandgeneratingincreasedviraldiversity
AT turgeonmaxime novelsarscov2viralsequencebioinformaticpipelinehasfoundgeneticevidencethattheviral3untranslatedregionutrisevolvingandgeneratingincreasedviraldiversity
AT haighjodyj novelsarscov2viralsequencebioinformaticpipelinehasfoundgeneticevidencethattheviral3untranslatedregionutrisevolvingandgeneratingincreasedviraldiversity