Cargando…

Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples

BACKGROUND: The accurate identification of SARS-CoV-2 (SC2) variants and estimation of their abundance in mixed population samples (e.g., air or wastewater) is imperative for successful surveillance of community level trends. Assessing the performance of SC2 variant composition estimators (VCEs) sho...

Descripción completa

Detalles Bibliográficos
Autores principales: Kayikcioglu, Tunc, Amirzadegan, Jasmine, Rand, Hugh, Tesfaldet, Bereket, Timme, Ruth E., Pettengill, James B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9884472/
https://www.ncbi.nlm.nih.gov/pubmed/36721781
http://dx.doi.org/10.7717/peerj.14596
_version_ 1784879724942589952
author Kayikcioglu, Tunc
Amirzadegan, Jasmine
Rand, Hugh
Tesfaldet, Bereket
Timme, Ruth E.
Pettengill, James B.
author_facet Kayikcioglu, Tunc
Amirzadegan, Jasmine
Rand, Hugh
Tesfaldet, Bereket
Timme, Ruth E.
Pettengill, James B.
author_sort Kayikcioglu, Tunc
collection PubMed
description BACKGROUND: The accurate identification of SARS-CoV-2 (SC2) variants and estimation of their abundance in mixed population samples (e.g., air or wastewater) is imperative for successful surveillance of community level trends. Assessing the performance of SC2 variant composition estimators (VCEs) should improve our confidence in public health decision making. Here, we introduce a linear regression based VCE and compare its performance to four other VCEs: two re-purposed DNA sequence read classifiers (Kallisto and Kraken2), a maximum-likelihood based method (Lineage deComposition for Sars-Cov-2 pooled samples (LCS)), and a regression based method (Freyja). METHODS: We simulated DNA sequence datasets of known variant composition from both Illumina and Oxford Nanopore Technologies (ONT) platforms and assessed the performance of each VCE. We also evaluated VCEs performance using publicly available empirical wastewater samples collected for SC2 surveillance efforts. Bioinformatic analyses were performed with a custom NextFlow workflow (C-WAP, CFSAN Wastewater Analysis Pipeline). Relative root mean squared error (RRMSE) was used as a measure of performance with respect to the known abundance and concordance correlation coefficient (CCC) was used to measure agreement between pairs of estimators. RESULTS: Based on our results from simulated data, Kallisto was the most accurate estimator as it had the lowest RRMSE, followed by Freyja. Kallisto and Freyja had the most similar predictions, reflected by the highest CCC metrics. We also found that accuracy was platform and amplicon panel dependent. For example, the accuracy of Freyja was significantly higher with Illumina data compared to ONT data; performance of Kallisto was best with ARTICv4. However, when analyzing empirical data there was poor agreement among methods and variations in the number of variants detected (e.g., Freyja ARTICv4 had a mean of 2.2 variants while Kallisto ARTICv4 had a mean of 10.1 variants). CONCLUSION: This work provides an understanding of the differences in performance of a number of VCEs and how accurate they are in capturing the relative abundance of SC2 variants within a mixed sample (e.g., wastewater). Such information should help officials gauge the confidence they can have in such data for informing public health decisions.
format Online
Article
Text
id pubmed-9884472
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-98844722023-01-30 Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples Kayikcioglu, Tunc Amirzadegan, Jasmine Rand, Hugh Tesfaldet, Bereket Timme, Ruth E. Pettengill, James B. PeerJ Bioinformatics BACKGROUND: The accurate identification of SARS-CoV-2 (SC2) variants and estimation of their abundance in mixed population samples (e.g., air or wastewater) is imperative for successful surveillance of community level trends. Assessing the performance of SC2 variant composition estimators (VCEs) should improve our confidence in public health decision making. Here, we introduce a linear regression based VCE and compare its performance to four other VCEs: two re-purposed DNA sequence read classifiers (Kallisto and Kraken2), a maximum-likelihood based method (Lineage deComposition for Sars-Cov-2 pooled samples (LCS)), and a regression based method (Freyja). METHODS: We simulated DNA sequence datasets of known variant composition from both Illumina and Oxford Nanopore Technologies (ONT) platforms and assessed the performance of each VCE. We also evaluated VCEs performance using publicly available empirical wastewater samples collected for SC2 surveillance efforts. Bioinformatic analyses were performed with a custom NextFlow workflow (C-WAP, CFSAN Wastewater Analysis Pipeline). Relative root mean squared error (RRMSE) was used as a measure of performance with respect to the known abundance and concordance correlation coefficient (CCC) was used to measure agreement between pairs of estimators. RESULTS: Based on our results from simulated data, Kallisto was the most accurate estimator as it had the lowest RRMSE, followed by Freyja. Kallisto and Freyja had the most similar predictions, reflected by the highest CCC metrics. We also found that accuracy was platform and amplicon panel dependent. For example, the accuracy of Freyja was significantly higher with Illumina data compared to ONT data; performance of Kallisto was best with ARTICv4. However, when analyzing empirical data there was poor agreement among methods and variations in the number of variants detected (e.g., Freyja ARTICv4 had a mean of 2.2 variants while Kallisto ARTICv4 had a mean of 10.1 variants). CONCLUSION: This work provides an understanding of the differences in performance of a number of VCEs and how accurate they are in capturing the relative abundance of SC2 variants within a mixed sample (e.g., wastewater). Such information should help officials gauge the confidence they can have in such data for informing public health decisions. PeerJ Inc. 2023-01-26 /pmc/articles/PMC9884472/ /pubmed/36721781 http://dx.doi.org/10.7717/peerj.14596 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, made available under the Creative Commons Public Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/) . This work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Bioinformatics
Kayikcioglu, Tunc
Amirzadegan, Jasmine
Rand, Hugh
Tesfaldet, Bereket
Timme, Ruth E.
Pettengill, James B.
Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples
title Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples
title_full Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples
title_fullStr Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples
title_full_unstemmed Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples
title_short Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples
title_sort performance of methods for sars-cov-2 variant detection and abundance estimation within mixed population samples
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9884472/
https://www.ncbi.nlm.nih.gov/pubmed/36721781
http://dx.doi.org/10.7717/peerj.14596
work_keys_str_mv AT kayikcioglutunc performanceofmethodsforsarscov2variantdetectionandabundanceestimationwithinmixedpopulationsamples
AT amirzadeganjasmine performanceofmethodsforsarscov2variantdetectionandabundanceestimationwithinmixedpopulationsamples
AT randhugh performanceofmethodsforsarscov2variantdetectionandabundanceestimationwithinmixedpopulationsamples
AT tesfaldetbereket performanceofmethodsforsarscov2variantdetectionandabundanceestimationwithinmixedpopulationsamples
AT timmeruthe performanceofmethodsforsarscov2variantdetectionandabundanceestimationwithinmixedpopulationsamples
AT pettengilljamesb performanceofmethodsforsarscov2variantdetectionandabundanceestimationwithinmixedpopulationsamples