Cargando…
Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus
In order to survey noroviruses in our environment, it is essential that both wet-lab and computational methods are fit for purpose. Using a simulated sequencing data set, denoising-based (DADA2, Deblur and USEARCH-UNOISE3) and clustering-based pipelines (VSEARCH and FROGS) were compared with respect...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society for Microbiology
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888279/ https://www.ncbi.nlm.nih.gov/pubmed/36541780 http://dx.doi.org/10.1128/aem.01522-22 |
_version_ | 1784880502382002176 |
---|---|
author | Fitzpatrick, Amy H. Rupnik, Agnieszka O’Shea, Helen Crispie, Fiona Keaveney, Sinéad Cotter, Paul D. |
author_facet | Fitzpatrick, Amy H. Rupnik, Agnieszka O’Shea, Helen Crispie, Fiona Keaveney, Sinéad Cotter, Paul D. |
author_sort | Fitzpatrick, Amy H. |
collection | PubMed |
description | In order to survey noroviruses in our environment, it is essential that both wet-lab and computational methods are fit for purpose. Using a simulated sequencing data set, denoising-based (DADA2, Deblur and USEARCH-UNOISE3) and clustering-based pipelines (VSEARCH and FROGS) were compared with respect to their ability to represent composition and sequence information. Open source classifiers (Ribosomal Database Project [RDP], BLASTn, IDTAXA, QIIME2 naive Bayes, and SINTAX) were trained using three different databases: a custom database, the NoroNet database, and the Human calicivirus database. Each classifier and database combination was compared from the perspective of their classification accuracy. VSEARCH provides a robust option for analyzing viral amplicons based on composition analysis; however, all pipelines could return OTUs with high similarity to the expected sequences. Importantly, pipeline choice could lead to more false positives (DADA2) or underclassification (FROGS), a key aspect when considering pipeline application for source attribution. Classification was more strongly impacted by the classifier than the database, although disagreement increased with norovirus GII.4 capsid variant designation. We recommend the use of the RDP classifier in conjunction with VSEARCH; however, maintenance of the underlying database is essential for optimal use. IMPORTANCE In benchmarking bioinformatic pipelines for analyzing high-throughput sequencing (HTS) data sets, we provide method standardization for bioinformatics broadly and specifically for norovirus in situations for which no officially endorsed methods exist at present. This study provides recommendations for the appropriate analysis and classification of norovirus amplicon HTS data and will be widely applicable during outbreak investigations. |
format | Online Article Text |
id | pubmed-9888279 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | American Society for Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-98882792023-02-01 Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus Fitzpatrick, Amy H. Rupnik, Agnieszka O’Shea, Helen Crispie, Fiona Keaveney, Sinéad Cotter, Paul D. Appl Environ Microbiol Methods In order to survey noroviruses in our environment, it is essential that both wet-lab and computational methods are fit for purpose. Using a simulated sequencing data set, denoising-based (DADA2, Deblur and USEARCH-UNOISE3) and clustering-based pipelines (VSEARCH and FROGS) were compared with respect to their ability to represent composition and sequence information. Open source classifiers (Ribosomal Database Project [RDP], BLASTn, IDTAXA, QIIME2 naive Bayes, and SINTAX) were trained using three different databases: a custom database, the NoroNet database, and the Human calicivirus database. Each classifier and database combination was compared from the perspective of their classification accuracy. VSEARCH provides a robust option for analyzing viral amplicons based on composition analysis; however, all pipelines could return OTUs with high similarity to the expected sequences. Importantly, pipeline choice could lead to more false positives (DADA2) or underclassification (FROGS), a key aspect when considering pipeline application for source attribution. Classification was more strongly impacted by the classifier than the database, although disagreement increased with norovirus GII.4 capsid variant designation. We recommend the use of the RDP classifier in conjunction with VSEARCH; however, maintenance of the underlying database is essential for optimal use. IMPORTANCE In benchmarking bioinformatic pipelines for analyzing high-throughput sequencing (HTS) data sets, we provide method standardization for bioinformatics broadly and specifically for norovirus in situations for which no officially endorsed methods exist at present. This study provides recommendations for the appropriate analysis and classification of norovirus amplicon HTS data and will be widely applicable during outbreak investigations. American Society for Microbiology 2022-12-21 /pmc/articles/PMC9888279/ /pubmed/36541780 http://dx.doi.org/10.1128/aem.01522-22 Text en © Crown copyright 2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Methods Fitzpatrick, Amy H. Rupnik, Agnieszka O’Shea, Helen Crispie, Fiona Keaveney, Sinéad Cotter, Paul D. Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus |
title | Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus |
title_full | Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus |
title_fullStr | Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus |
title_full_unstemmed | Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus |
title_short | Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus |
title_sort | benchmarking bioinformatic tools for amplicon-based sequencing of norovirus |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888279/ https://www.ncbi.nlm.nih.gov/pubmed/36541780 http://dx.doi.org/10.1128/aem.01522-22 |
work_keys_str_mv | AT fitzpatrickamyh benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus AT rupnikagnieszka benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus AT osheahelen benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus AT crispiefiona benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus AT keaveneysinead benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus AT cotterpauld benchmarkingbioinformatictoolsforampliconbasedsequencingofnorovirus |