Cargando…

Canary: an atomic pipeline for clinical amplicon assays

BACKGROUND: High throughput sequencing requires bioinformatics pipelines to process large volumes of data into meaningful variants that can be translated into a clinical report. These pipelines often suffer from a number of shortcomings: they lack robustness and have many components written in multi...

Descripción completa

Detalles Bibliográficos
Autores principales: Doig, Kenneth D., Ellul, Jason, Fellowes, Andrew, Thompson, Ella R., Ryland, Georgina, Blombery, Piers, Papenfuss, Anthony T., Fox, Stephen B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5732437/
https://www.ncbi.nlm.nih.gov/pubmed/29246107
http://dx.doi.org/10.1186/s12859-017-1950-z
_version_ 1783286697294823424
author Doig, Kenneth D.
Ellul, Jason
Fellowes, Andrew
Thompson, Ella R.
Ryland, Georgina
Blombery, Piers
Papenfuss, Anthony T.
Fox, Stephen B.
author_facet Doig, Kenneth D.
Ellul, Jason
Fellowes, Andrew
Thompson, Ella R.
Ryland, Georgina
Blombery, Piers
Papenfuss, Anthony T.
Fox, Stephen B.
author_sort Doig, Kenneth D.
collection PubMed
description BACKGROUND: High throughput sequencing requires bioinformatics pipelines to process large volumes of data into meaningful variants that can be translated into a clinical report. These pipelines often suffer from a number of shortcomings: they lack robustness and have many components written in multiple languages, each with a variety of resource requirements. Pipeline components must be linked together with a workflow system to achieve the processing of FASTQ files through to a VCF file of variants. Crafting these pipelines requires considerable bioinformatics and IT skills beyond the reach of many clinical laboratories. RESULTS: Here we present Canary, a single program that can be run on a laptop, which takes FASTQ files from amplicon assays through to an annotated VCF file ready for clinical analysis. Canary can be installed and run with a single command using Docker containerization or run as a single JAR file on a wide range of platforms. Although it is a single utility, Canary performs all the functions present in more complex and unwieldy pipelines. All variants identified by Canary are 3′ shifted and represented in their most parsimonious form to provide a consistent nomenclature, irrespective of sequencing variation. Further, proximate in-phase variants are represented as a single HGVS ‘delins’ variant. This allows for correct nomenclature and consequences to be ascribed to complex multi-nucleotide polymorphisms (MNPs), which are otherwise difficult to represent and interpret. Variants can also be annotated with hundreds of attributes sourced from MyVariant.info to give up to date details on pathogenicity, population statistics and in-silico predictors. CONCLUSIONS: Canary has been used at the Peter MacCallum Cancer Centre in Melbourne for the last 2 years for the processing of clinical sequencing data. By encapsulating clinical features in a single, easily installed executable, Canary makes sequencing more accessible to all pathology laboratories. Canary is available for download as source or a Docker image at https://github.com/PapenfussLab/Canary under a GPL-3.0 License. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1950-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5732437
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57324372017-12-21 Canary: an atomic pipeline for clinical amplicon assays Doig, Kenneth D. Ellul, Jason Fellowes, Andrew Thompson, Ella R. Ryland, Georgina Blombery, Piers Papenfuss, Anthony T. Fox, Stephen B. BMC Bioinformatics Software BACKGROUND: High throughput sequencing requires bioinformatics pipelines to process large volumes of data into meaningful variants that can be translated into a clinical report. These pipelines often suffer from a number of shortcomings: they lack robustness and have many components written in multiple languages, each with a variety of resource requirements. Pipeline components must be linked together with a workflow system to achieve the processing of FASTQ files through to a VCF file of variants. Crafting these pipelines requires considerable bioinformatics and IT skills beyond the reach of many clinical laboratories. RESULTS: Here we present Canary, a single program that can be run on a laptop, which takes FASTQ files from amplicon assays through to an annotated VCF file ready for clinical analysis. Canary can be installed and run with a single command using Docker containerization or run as a single JAR file on a wide range of platforms. Although it is a single utility, Canary performs all the functions present in more complex and unwieldy pipelines. All variants identified by Canary are 3′ shifted and represented in their most parsimonious form to provide a consistent nomenclature, irrespective of sequencing variation. Further, proximate in-phase variants are represented as a single HGVS ‘delins’ variant. This allows for correct nomenclature and consequences to be ascribed to complex multi-nucleotide polymorphisms (MNPs), which are otherwise difficult to represent and interpret. Variants can also be annotated with hundreds of attributes sourced from MyVariant.info to give up to date details on pathogenicity, population statistics and in-silico predictors. CONCLUSIONS: Canary has been used at the Peter MacCallum Cancer Centre in Melbourne for the last 2 years for the processing of clinical sequencing data. By encapsulating clinical features in a single, easily installed executable, Canary makes sequencing more accessible to all pathology laboratories. Canary is available for download as source or a Docker image at https://github.com/PapenfussLab/Canary under a GPL-3.0 License. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1950-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-15 /pmc/articles/PMC5732437/ /pubmed/29246107 http://dx.doi.org/10.1186/s12859-017-1950-z Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Doig, Kenneth D.
Ellul, Jason
Fellowes, Andrew
Thompson, Ella R.
Ryland, Georgina
Blombery, Piers
Papenfuss, Anthony T.
Fox, Stephen B.
Canary: an atomic pipeline for clinical amplicon assays
title Canary: an atomic pipeline for clinical amplicon assays
title_full Canary: an atomic pipeline for clinical amplicon assays
title_fullStr Canary: an atomic pipeline for clinical amplicon assays
title_full_unstemmed Canary: an atomic pipeline for clinical amplicon assays
title_short Canary: an atomic pipeline for clinical amplicon assays
title_sort canary: an atomic pipeline for clinical amplicon assays
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5732437/
https://www.ncbi.nlm.nih.gov/pubmed/29246107
http://dx.doi.org/10.1186/s12859-017-1950-z
work_keys_str_mv AT doigkennethd canaryanatomicpipelineforclinicalampliconassays
AT elluljason canaryanatomicpipelineforclinicalampliconassays
AT fellowesandrew canaryanatomicpipelineforclinicalampliconassays
AT thompsonellar canaryanatomicpipelineforclinicalampliconassays
AT rylandgeorgina canaryanatomicpipelineforclinicalampliconassays
AT blomberypiers canaryanatomicpipelineforclinicalampliconassays
AT papenfussanthonyt canaryanatomicpipelineforclinicalampliconassays
AT foxstephenb canaryanatomicpipelineforclinicalampliconassays