Cargando…

Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus

In order to survey noroviruses in our environment, it is essential that both wet-lab and computational methods are fit for purpose. Using a simulated sequencing data set, denoising-based (DADA2, Deblur and USEARCH-UNOISE3) and clustering-based pipelines (VSEARCH and FROGS) were compared with respect...

Descripción completa

Detalles Bibliográficos
Autores principales: Fitzpatrick, Amy H., Rupnik, Agnieszka, O’Shea, Helen, Crispie, Fiona, Keaveney, Sinéad, Cotter, Paul D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888279/
https://www.ncbi.nlm.nih.gov/pubmed/36541780
http://dx.doi.org/10.1128/aem.01522-22
_version_ 1784880502382002176
author Fitzpatrick, Amy H.
Rupnik, Agnieszka
O’Shea, Helen
Crispie, Fiona
Keaveney, Sinéad
Cotter, Paul D.
author_facet Fitzpatrick, Amy H.
Rupnik, Agnieszka
O’Shea, Helen
Crispie, Fiona
Keaveney, Sinéad
Cotter, Paul D.
author_sort Fitzpatrick, Amy H.
collection PubMed
description In order to survey noroviruses in our environment, it is essential that both wet-lab and computational methods are fit for purpose. Using a simulated sequencing data set, denoising-based (DADA2, Deblur and USEARCH-UNOISE3) and clustering-based pipelines (VSEARCH and FROGS) were compared with respect to their ability to represent composition and sequence information. Open source classifiers (Ribosomal Database Project [RDP], BLASTn, IDTAXA, QIIME2 naive Bayes, and SINTAX) were trained using three different databases: a custom database, the NoroNet database, and the Human calicivirus database. Each classifier and database combination was compared from the perspective of their classification accuracy. VSEARCH provides a robust option for analyzing viral amplicons based on composition analysis; however, all pipelines could return OTUs with high similarity to the expected sequences. Importantly, pipeline choice could lead to more false positives (DADA2) or underclassification (FROGS), a key aspect when considering pipeline application for source attribution. Classification was more strongly impacted by the classifier than the database, although disagreement increased with norovirus GII.4 capsid variant designation. We recommend the use of the RDP classifier in conjunction with VSEARCH; however, maintenance of the underlying database is essential for optimal use. IMPORTANCE In benchmarking bioinformatic pipelines for analyzing high-throughput sequencing (HTS) data sets, we provide method standardization for bioinformatics broadly and specifically for norovirus in situations for which no officially endorsed methods exist at present. This study provides recommendations for the appropriate analysis and classification of norovirus amplicon HTS data and will be widely applicable during outbreak investigations.
format Online
Article
Text
id pubmed-9888279
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-98882792023-02-01 Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus Fitzpatrick, Amy H. Rupnik, Agnieszka O’Shea, Helen Crispie, Fiona Keaveney, Sinéad Cotter, Paul D. Appl Environ Microbiol Methods In order to survey noroviruses in our environment, it is essential that both wet-lab and computational methods are fit for purpose. Using a simulated sequencing data set, denoising-based (DADA2, Deblur and USEARCH-UNOISE3) and clustering-based pipelines (VSEARCH and FROGS) were compared with respect to their ability to represent composition and sequence information. Open source classifiers (Ribosomal Database Project [RDP], BLASTn, IDTAXA, QIIME2 naive Bayes, and SINTAX) were trained using three different databases: a custom database, the NoroNet database, and the Human calicivirus database. Each classifier and database combination was compared from the perspective of their classification accuracy. VSEARCH provides a robust option for analyzing viral amplicons based on composition analysis; however, all pipelines could return OTUs with high similarity to the expected sequences. Importantly, pipeline choice could lead to more false positives (DADA2) or underclassification (FROGS), a key aspect when considering pipeline application for source attribution. Classification was more strongly impacted by the classifier than the database, although disagreement increased with norovirus GII.4 capsid variant designation. We recommend the use of the RDP classifier in conjunction with VSEARCH; however, maintenance of the underlying database is essential for optimal use. IMPORTANCE In benchmarking bioinformatic pipelines for analyzing high-throughput sequencing (HTS) data sets, we provide method standardization for bioinformatics broadly and specifically for norovirus in situations for which no officially endorsed methods exist at present. This study provides recommendations for the appropriate analysis and classification of norovirus amplicon HTS data and will be widely applicable during outbreak investigations. American Society for Microbiology 2022-12-21 /pmc/articles/PMC9888279/ /pubmed/36541780 http://dx.doi.org/10.1128/aem.01522-22 Text en © Crown copyright 2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods
Fitzpatrick, Amy H.
Rupnik, Agnieszka
O’Shea, Helen
Crispie, Fiona
Keaveney, Sinéad
Cotter, Paul D.
Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus
title Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus
title_full Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus
title_fullStr Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus
title_full_unstemmed Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus
title_short Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus
title_sort benchmarking bioinformatic tools for amplicon-based sequencing of norovirus
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888279/
https://www.ncbi.nlm.nih.gov/pubmed/36541780
http://dx.doi.org/10.1128/aem.01522-22
work_keys_str_mv AT fitzpatrickamyh benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus
AT rupnikagnieszka benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus
AT osheahelen benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus
AT crispiefiona benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus
AT keaveneysinead benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus
AT cotterpauld benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus