Cargando…

Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection

High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings...

Descripción completa

Detalles Bibliográficos
Autores principales: Lambert, Christophe, Braxton, Cassandra, Charlebois, Robert L., Deyati, Avisek, Duncan, Paul, La Neve, Fabio, Malicki, Heather D., Ribrioux, Sebastien, Rozelle, Daniel K., Michaels, Brandye, Sun, Wenping, Yang, Zhihui, Khan, Arifa S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6213042/
https://www.ncbi.nlm.nih.gov/pubmed/30262776
http://dx.doi.org/10.3390/v10100528
_version_ 1783367680663748608
author Lambert, Christophe
Braxton, Cassandra
Charlebois, Robert L.
Deyati, Avisek
Duncan, Paul
La Neve, Fabio
Malicki, Heather D.
Ribrioux, Sebastien
Rozelle, Daniel K.
Michaels, Brandye
Sun, Wenping
Yang, Zhihui
Khan, Arifa S.
author_facet Lambert, Christophe
Braxton, Cassandra
Charlebois, Robert L.
Deyati, Avisek
Duncan, Paul
La Neve, Fabio
Malicki, Heather D.
Ribrioux, Sebastien
Rozelle, Daniel K.
Michaels, Brandye
Sun, Wenping
Yang, Zhihui
Khan, Arifa S.
author_sort Lambert, Christophe
collection PubMed
description High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings that can afford adequate sensitivity at an acceptable computational cost (computation time, computer memory, storage, expense or/and efficiency), at critical steps in the bioinformatics pipeline, including initial data quality assessment, trimming/cleaning, and assembly (to reduce data volume and increase likelihood of appropriate sequence identification). Additionally, the quality and reliability of the results depend on the availability of a complete and curated viral database for obtaining accurate results; selection of sequence alignment programs and their configuration, that retains specificity for broad virus detection with reduced false-positive signals; removal of host sequences without loss of endogenous viral sequences of interest; and use of a meaningful reporting format, which can retain critical information of the analysis for presentation of readily interpretable data and actionable results. Furthermore, after alignment, both automated and manual evaluation may be needed to verify the results and help assign a potential risk level to residual, unmapped reads. We hope that the collective considerations discussed in this paper aid toward optimization of data analysis pipelines for virus detection by HTS.
format Online
Article
Text
id pubmed-6213042
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-62130422018-11-09 Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection Lambert, Christophe Braxton, Cassandra Charlebois, Robert L. Deyati, Avisek Duncan, Paul La Neve, Fabio Malicki, Heather D. Ribrioux, Sebastien Rozelle, Daniel K. Michaels, Brandye Sun, Wenping Yang, Zhihui Khan, Arifa S. Viruses Perspective High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings that can afford adequate sensitivity at an acceptable computational cost (computation time, computer memory, storage, expense or/and efficiency), at critical steps in the bioinformatics pipeline, including initial data quality assessment, trimming/cleaning, and assembly (to reduce data volume and increase likelihood of appropriate sequence identification). Additionally, the quality and reliability of the results depend on the availability of a complete and curated viral database for obtaining accurate results; selection of sequence alignment programs and their configuration, that retains specificity for broad virus detection with reduced false-positive signals; removal of host sequences without loss of endogenous viral sequences of interest; and use of a meaningful reporting format, which can retain critical information of the analysis for presentation of readily interpretable data and actionable results. Furthermore, after alignment, both automated and manual evaluation may be needed to verify the results and help assign a potential risk level to residual, unmapped reads. We hope that the collective considerations discussed in this paper aid toward optimization of data analysis pipelines for virus detection by HTS. MDPI 2018-09-27 /pmc/articles/PMC6213042/ /pubmed/30262776 http://dx.doi.org/10.3390/v10100528 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Perspective
Lambert, Christophe
Braxton, Cassandra
Charlebois, Robert L.
Deyati, Avisek
Duncan, Paul
La Neve, Fabio
Malicki, Heather D.
Ribrioux, Sebastien
Rozelle, Daniel K.
Michaels, Brandye
Sun, Wenping
Yang, Zhihui
Khan, Arifa S.
Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection
title Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection
title_full Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection
title_fullStr Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection
title_full_unstemmed Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection
title_short Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection
title_sort considerations for optimization of high-throughput sequencing bioinformatics pipelines for virus detection
topic Perspective
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6213042/
https://www.ncbi.nlm.nih.gov/pubmed/30262776
http://dx.doi.org/10.3390/v10100528
work_keys_str_mv AT lambertchristophe considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT braxtoncassandra considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT charleboisrobertl considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT deyatiavisek considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT duncanpaul considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT lanevefabio considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT malickiheatherd considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT ribriouxsebastien considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT rozelledanielk considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT michaelsbrandye considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT sunwenping considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT yangzhihui considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection
AT khanarifas considerationsforoptimizationofhighthroughputsequencingbioinformaticspipelinesforvirusdetection