Cargando…

Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data

High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in vi...

Descripción completa

Detalles Bibliográficos
Autores principales: Roder, A. E., Johnson, K. E. E., Knoll, M., Khalfan, M., Wang, B., Schultz-Cherry, S., Banakis, S., Kreitman, A., Mederos, C., Youn, J.-H., Mercado, R., Wang, W., Chung, M., Ruchnewitz, D., Samanovic, M. I., Mulligan, M. J., Lässig, M., Luksza, M., Das, S., Gresham, D., Ghedin, E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470513/
https://www.ncbi.nlm.nih.gov/pubmed/37389439
http://dx.doi.org/10.1128/mbio.01046-23
_version_ 1785099694668513280
author Roder, A. E.
Johnson, K. E. E.
Knoll, M.
Khalfan, M.
Wang, B.
Schultz-Cherry, S.
Banakis, S.
Kreitman, A.
Mederos, C.
Youn, J.-H.
Mercado, R.
Wang, W.
Chung, M.
Ruchnewitz, D.
Samanovic, M. I.
Mulligan, M. J.
Lässig, M.
Luksza, M.
Das, S.
Gresham, D.
Ghedin, E.
author_facet Roder, A. E.
Johnson, K. E. E.
Knoll, M.
Khalfan, M.
Wang, B.
Schultz-Cherry, S.
Banakis, S.
Kreitman, A.
Mederos, C.
Youn, J.-H.
Mercado, R.
Wang, W.
Chung, M.
Ruchnewitz, D.
Samanovic, M. I.
Mulligan, M. J.
Lässig, M.
Luksza, M.
Das, S.
Gresham, D.
Ghedin, E.
author_sort Roder, A. E.
collection PubMed
description High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant-calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller and use of replicate sequencing have the most significant impact on single-nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false-negative rates. When replicates are not available, using a combination of multiple callers with more stringent cutoffs is recommended. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intra-host viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intra-host variation, viral diversity, and viral evolution. IMPORTANCE: When viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.
format Online
Article
Text
id pubmed-10470513
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-104705132023-09-01 Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data Roder, A. E. Johnson, K. E. E. Knoll, M. Khalfan, M. Wang, B. Schultz-Cherry, S. Banakis, S. Kreitman, A. Mederos, C. Youn, J.-H. Mercado, R. Wang, W. Chung, M. Ruchnewitz, D. Samanovic, M. I. Mulligan, M. J. Lässig, M. Luksza, M. Das, S. Gresham, D. Ghedin, E. mBio Research Article High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant-calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller and use of replicate sequencing have the most significant impact on single-nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false-negative rates. When replicates are not available, using a combination of multiple callers with more stringent cutoffs is recommended. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intra-host viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intra-host variation, viral diversity, and viral evolution. IMPORTANCE: When viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution. American Society for Microbiology 2023-06-30 /pmc/articles/PMC10470513/ /pubmed/37389439 http://dx.doi.org/10.1128/mbio.01046-23 Text en https://doi.org/10.1128/AuthorWarrantyLicense.v1This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
spellingShingle Research Article
Roder, A. E.
Johnson, K. E. E.
Knoll, M.
Khalfan, M.
Wang, B.
Schultz-Cherry, S.
Banakis, S.
Kreitman, A.
Mederos, C.
Youn, J.-H.
Mercado, R.
Wang, W.
Chung, M.
Ruchnewitz, D.
Samanovic, M. I.
Mulligan, M. J.
Lässig, M.
Luksza, M.
Das, S.
Gresham, D.
Ghedin, E.
Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data
title Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data
title_full Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data
title_fullStr Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data
title_full_unstemmed Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data
title_short Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data
title_sort optimized quantification of intra-host viral diversity in sars-cov-2 and influenza virus sequence data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470513/
https://www.ncbi.nlm.nih.gov/pubmed/37389439
http://dx.doi.org/10.1128/mbio.01046-23
work_keys_str_mv AT roderae optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT johnsonkee optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT knollm optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT khalfanm optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT wangb optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT schultzcherrys optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT banakiss optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT kreitmana optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT mederosc optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT younjh optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT mercador optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT wangw optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT chungm optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT ruchnewitzd optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT samanovicmi optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT mulliganmj optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT lassigm optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT lukszam optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT dass optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT greshamd optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata
AT ghedine optimizedquantificationofintrahostviraldiversityinsarscov2andinfluenzavirussequencedata