Cargando…
A43 Translational research: NGS metagenomics into clinical diagnostics
As research next-generation sequencing (NGS) metagenomic pipelines transition to clinical diagnostics, the user-base changes from bioinformaticians to biologists, medical doctors, and lab-technicians. Besides the obvious need for benchmarking and assessment of diagnostic outcomes of the pipelines an...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6735915/ http://dx.doi.org/10.1093/ve/vez002.042 |
Sumario: | As research next-generation sequencing (NGS) metagenomic pipelines transition to clinical diagnostics, the user-base changes from bioinformaticians to biologists, medical doctors, and lab-technicians. Besides the obvious need for benchmarking and assessment of diagnostic outcomes of the pipelines and tools, other focus points remain: reproducibility, data immutability, user-friendliness, portability/scalability, privacy, and a clear audit trail. We have a research metagenomics pipeline that takes raw fastq files and produces annotated contigs, but it is too complicated for non-bioinformaticians. Here, we present preliminary findings in adapting this pipeline for clinical diagnostics. We used information available on relevant fora (www.bioinfo-core.org) and experiences and publications from colleague bioinformaticians in other institutes (COMPARE, UBC, and LUMC). From this information, a robust and user-friendly storage and analysis workflow was designed for non-bioinformaticians in a clinical setting. Via Conda [https://conda.io] and Docker containers [http://www.docker.com], we made our disparate pipeline processes self-contained and reproducible. Furthermore, we moved all pipeline settings into a separate JSON file. After every analysis, the pipeline settings and virtual-environment recipes will be archived (immutably) under a persistent unique identifier. This allows long-term precise reproducibility. Likewise, after every run the raw data and final products will be automatically archived, complying with data retention laws/guidelines. All the disparate processes in the pipeline are parallelized and automated via Snakemake1 (i.e. end-users need no coding skills). In addition, interactive web-reports such as MultiQC [http://multiqc.info] and Krona2 are generated automatically. By combining Snakemake, Conda, and containers, our pipeline is highly portable and easily scaled up for outbreak situations, or scaled down to reduce costs. Since patient privacy is a concern, our pipeline automatically removes human genetic data. Moreover, all source code will be stored on an internal Gitlab server, and, combined with the archived data, ensures a clear audit trail. Nevertheless, challenges remain: (1) reproducible reference databases, e.g. being able to revert to an older version to reproduce old analyses. (2) A user-friendly GUI. (3) Connecting the pipeline and NGS data to in-house LIMS. (4) Efficient long-term storage, e.g. lossless compression algorithms. Nevertheless, this work represents a step forward in making user-friendly clinical diagnostic workflows. |
---|