Cargando…

Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing

The ongoing COVID-19 pandemic, caused by SARS-CoV-2, constitutes a tremendous global health issue. Continuous monitoring of the virus has become a cornerstone to make rational decisions on implementing societal and sanitary measures to curtail the virus spread. Additionally, emerging SARS-CoV-2 vari...

Descripción completa

Detalles Bibliográficos
Autores principales: Van Poelvoorde, Laura A. E., Delcourt, Thomas, Coucke, Wim, Herman, Philippe, De Keersmaecker, Sigrid C. J., Saelens, Xavier, Roosens, Nancy H. C., Vanneste, Kevin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8548777/
https://www.ncbi.nlm.nih.gov/pubmed/34721349
http://dx.doi.org/10.3389/fmicb.2021.747458
_version_ 1784590651875131392
author Van Poelvoorde, Laura A. E.
Delcourt, Thomas
Coucke, Wim
Herman, Philippe
De Keersmaecker, Sigrid C. J.
Saelens, Xavier
Roosens, Nancy H. C.
Vanneste, Kevin
author_facet Van Poelvoorde, Laura A. E.
Delcourt, Thomas
Coucke, Wim
Herman, Philippe
De Keersmaecker, Sigrid C. J.
Saelens, Xavier
Roosens, Nancy H. C.
Vanneste, Kevin
author_sort Van Poelvoorde, Laura A. E.
collection PubMed
description The ongoing COVID-19 pandemic, caused by SARS-CoV-2, constitutes a tremendous global health issue. Continuous monitoring of the virus has become a cornerstone to make rational decisions on implementing societal and sanitary measures to curtail the virus spread. Additionally, emerging SARS-CoV-2 variants have increased the need for genomic surveillance to detect particular strains because of their potentially increased transmissibility, pathogenicity and immune escape. Targeted SARS-CoV-2 sequencing of diagnostic and wastewater samples has been explored as an epidemiological surveillance method for the competent authorities. Currently, only the consensus genome sequence of the most abundant strain is taken into consideration for analysis, but multiple variant strains are now circulating in the population. Consequently, in diagnostic samples, potential co-infection(s) by several different variants can occur or quasispecies can develop during an infection in an individual. In wastewater samples, multiple variant strains will often be simultaneously present. Currently, quality criteria are mainly available for constructing the consensus genome sequence, and some guidelines exist for the detection of co-infections and quasispecies in diagnostic samples. The performance of detection and quantification of low-frequency variants using whole genome sequencing (WGS) of SARS-CoV-2 remains largely unknown. Here, we evaluated the detection and quantification of mutations present at low abundances using the mutations defining the SARS-CoV-2 lineage B.1.1.7 (alpha variant) as a case study. Real sequencing data were in silico modified by introducing mutations of interest into raw wild-type sequencing data, or by mixing wild-type and mutant raw sequencing data, to construct mixed samples subjected to WGS using a tiling amplicon-based targeted metagenomics approach and Illumina sequencing. As anticipated, higher variation and lower sensitivity were observed at lower coverages and allelic frequencies. We found that detection of all low-frequency variants at an abundance of 10, 5, 3, and 1%, requires at least a sequencing coverage of 250, 500, 1500, and 10,000×, respectively. Although increasing variability of estimated allelic frequencies at decreasing coverages and lower allelic frequencies was observed, its impact on reliable quantification was limited. This study provides a highly sensitive low-frequency variant detection approach, which is publicly available at https://galaxy.sciensano.be, and specific recommendations for minimum sequencing coverages to detect clade-defining mutations at certain allelic frequencies. This approach will be useful to detect and quantify low-frequency variants in both diagnostic (e.g., co-infections and quasispecies) and wastewater [e.g., multiple variants of concern (VOCs)] samples.
format Online
Article
Text
id pubmed-8548777
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-85487772021-10-28 Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing Van Poelvoorde, Laura A. E. Delcourt, Thomas Coucke, Wim Herman, Philippe De Keersmaecker, Sigrid C. J. Saelens, Xavier Roosens, Nancy H. C. Vanneste, Kevin Front Microbiol Microbiology The ongoing COVID-19 pandemic, caused by SARS-CoV-2, constitutes a tremendous global health issue. Continuous monitoring of the virus has become a cornerstone to make rational decisions on implementing societal and sanitary measures to curtail the virus spread. Additionally, emerging SARS-CoV-2 variants have increased the need for genomic surveillance to detect particular strains because of their potentially increased transmissibility, pathogenicity and immune escape. Targeted SARS-CoV-2 sequencing of diagnostic and wastewater samples has been explored as an epidemiological surveillance method for the competent authorities. Currently, only the consensus genome sequence of the most abundant strain is taken into consideration for analysis, but multiple variant strains are now circulating in the population. Consequently, in diagnostic samples, potential co-infection(s) by several different variants can occur or quasispecies can develop during an infection in an individual. In wastewater samples, multiple variant strains will often be simultaneously present. Currently, quality criteria are mainly available for constructing the consensus genome sequence, and some guidelines exist for the detection of co-infections and quasispecies in diagnostic samples. The performance of detection and quantification of low-frequency variants using whole genome sequencing (WGS) of SARS-CoV-2 remains largely unknown. Here, we evaluated the detection and quantification of mutations present at low abundances using the mutations defining the SARS-CoV-2 lineage B.1.1.7 (alpha variant) as a case study. Real sequencing data were in silico modified by introducing mutations of interest into raw wild-type sequencing data, or by mixing wild-type and mutant raw sequencing data, to construct mixed samples subjected to WGS using a tiling amplicon-based targeted metagenomics approach and Illumina sequencing. As anticipated, higher variation and lower sensitivity were observed at lower coverages and allelic frequencies. We found that detection of all low-frequency variants at an abundance of 10, 5, 3, and 1%, requires at least a sequencing coverage of 250, 500, 1500, and 10,000×, respectively. Although increasing variability of estimated allelic frequencies at decreasing coverages and lower allelic frequencies was observed, its impact on reliable quantification was limited. This study provides a highly sensitive low-frequency variant detection approach, which is publicly available at https://galaxy.sciensano.be, and specific recommendations for minimum sequencing coverages to detect clade-defining mutations at certain allelic frequencies. This approach will be useful to detect and quantify low-frequency variants in both diagnostic (e.g., co-infections and quasispecies) and wastewater [e.g., multiple variants of concern (VOCs)] samples. Frontiers Media S.A. 2021-10-13 /pmc/articles/PMC8548777/ /pubmed/34721349 http://dx.doi.org/10.3389/fmicb.2021.747458 Text en Copyright © 2021 Van Poelvoorde, Delcourt, Coucke, Herman, De Keersmaecker, Saelens, Roosens and Vanneste. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Van Poelvoorde, Laura A. E.
Delcourt, Thomas
Coucke, Wim
Herman, Philippe
De Keersmaecker, Sigrid C. J.
Saelens, Xavier
Roosens, Nancy H. C.
Vanneste, Kevin
Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing
title Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing
title_full Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing
title_fullStr Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing
title_full_unstemmed Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing
title_short Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing
title_sort strategy and performance evaluation of low-frequency variant calling for sars-cov-2 using targeted deep illumina sequencing
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8548777/
https://www.ncbi.nlm.nih.gov/pubmed/34721349
http://dx.doi.org/10.3389/fmicb.2021.747458
work_keys_str_mv AT vanpoelvoordelauraae strategyandperformanceevaluationoflowfrequencyvariantcallingforsarscov2usingtargeteddeepilluminasequencing
AT delcourtthomas strategyandperformanceevaluationoflowfrequencyvariantcallingforsarscov2usingtargeteddeepilluminasequencing
AT couckewim strategyandperformanceevaluationoflowfrequencyvariantcallingforsarscov2usingtargeteddeepilluminasequencing
AT hermanphilippe strategyandperformanceevaluationoflowfrequencyvariantcallingforsarscov2usingtargeteddeepilluminasequencing
AT dekeersmaeckersigridcj strategyandperformanceevaluationoflowfrequencyvariantcallingforsarscov2usingtargeteddeepilluminasequencing
AT saelensxavier strategyandperformanceevaluationoflowfrequencyvariantcallingforsarscov2usingtargeteddeepilluminasequencing
AT roosensnancyhc strategyandperformanceevaluationoflowfrequencyvariantcallingforsarscov2usingtargeteddeepilluminasequencing
AT vannestekevin strategyandperformanceevaluationoflowfrequencyvariantcallingforsarscov2usingtargeteddeepilluminasequencing