Cargando…

Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays

BACKGROUND: Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such me...

Descripción completa

Detalles Bibliográficos
Autores principales:	Krishnan, Vandhana, Utiramerur, Sowmithri, Ng, Zena, Datta, Somalee, Snyder, Michael P., Ashley, Euan A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7903625/ https://www.ncbi.nlm.nih.gov/pubmed/33627090 http://dx.doi.org/10.1186/s12859-020-03934-3

_version_	1783654772230848512
author	Krishnan, Vandhana Utiramerur, Sowmithri Ng, Zena Datta, Somalee Snyder, Michael P. Ashley, Euan A.
author_facet	Krishnan, Vandhana Utiramerur, Sowmithri Ng, Zena Datta, Somalee Snyder, Michael P. Ashley, Euan A.
author_sort	Krishnan, Vandhana
collection	PubMed
description	BACKGROUND: Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such methods is not uniform across the different genomic regions of interest and variant types. Several benchmarking methods such as hap.py, vcfeval, and vcflib are available to assess the analytical performance characteristics of variant calling algorithms. However, assessing the performance characteristics of an overall LDT assay still requires stringing together several such methods and experienced bioinformaticians to interpret the results. In addition, these methods are dependent on the hardware, operating system and other software libraries, making it impossible to reliably repeat the analytical assessment, when any of the underlying dependencies change in the assay. Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples. RESULTS: The benchmarking workflow was used to evaluate the performance characteristics for secondary analysis pipelines commonly used by Clinical Genomics laboratories in their LDT assays such as the GATK HaplotypeCaller v3.7 and the SpeedSeq workflow based on FreeBayes v0.9.10. Five reference sample truth sets generated by Genome in a Bottle (GIAB) consortium, six samples from the Personal Genome Project (PGP) and several samples with validated clinically relevant variants from the Centers for Disease Control were used in this work. The performance characteristics were evaluated and compared for multiple reportable ranges, such as whole exome and the clinical exome. CONCLUSIONS: We have implemented a benchmarking workflow for clinical diagnostic laboratories that generates metrics such as specificity, precision and sensitivity for germline SNPs and InDels within a reportable range using whole exome or genome sequencing data. Combining these benchmarking results with validation using known variants of clinical significance in publicly available cell lines, we were able to establish the performance of variant calling pipelines in a clinical setting.
format	Online Article Text
id	pubmed-7903625
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-79036252021-03-01 Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays Krishnan, Vandhana Utiramerur, Sowmithri Ng, Zena Datta, Somalee Snyder, Michael P. Ashley, Euan A. BMC Bioinformatics Methodology Article BACKGROUND: Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such methods is not uniform across the different genomic regions of interest and variant types. Several benchmarking methods such as hap.py, vcfeval, and vcflib are available to assess the analytical performance characteristics of variant calling algorithms. However, assessing the performance characteristics of an overall LDT assay still requires stringing together several such methods and experienced bioinformaticians to interpret the results. In addition, these methods are dependent on the hardware, operating system and other software libraries, making it impossible to reliably repeat the analytical assessment, when any of the underlying dependencies change in the assay. Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples. RESULTS: The benchmarking workflow was used to evaluate the performance characteristics for secondary analysis pipelines commonly used by Clinical Genomics laboratories in their LDT assays such as the GATK HaplotypeCaller v3.7 and the SpeedSeq workflow based on FreeBayes v0.9.10. Five reference sample truth sets generated by Genome in a Bottle (GIAB) consortium, six samples from the Personal Genome Project (PGP) and several samples with validated clinically relevant variants from the Centers for Disease Control were used in this work. The performance characteristics were evaluated and compared for multiple reportable ranges, such as whole exome and the clinical exome. CONCLUSIONS: We have implemented a benchmarking workflow for clinical diagnostic laboratories that generates metrics such as specificity, precision and sensitivity for germline SNPs and InDels within a reportable range using whole exome or genome sequencing data. Combining these benchmarking results with validation using known variants of clinical significance in publicly available cell lines, we were able to establish the performance of variant calling pipelines in a clinical setting. BioMed Central 2021-02-24 /pmc/articles/PMC7903625/ /pubmed/33627090 http://dx.doi.org/10.1186/s12859-020-03934-3 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Krishnan, Vandhana Utiramerur, Sowmithri Ng, Zena Datta, Somalee Snyder, Michael P. Ashley, Euan A. Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
title	Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
title_full	Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
title_fullStr	Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
title_full_unstemmed	Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
title_short	Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
title_sort	benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7903625/ https://www.ncbi.nlm.nih.gov/pubmed/33627090 http://dx.doi.org/10.1186/s12859-020-03934-3
work_keys_str_mv	AT krishnanvandhana benchmarkingworkflowstoassessperformanceandsuitabilityofgermlinevariantcallingpipelinesinclinicaldiagnosticassays AT utiramerursowmithri benchmarkingworkflowstoassessperformanceandsuitabilityofgermlinevariantcallingpipelinesinclinicaldiagnosticassays AT ngzena benchmarkingworkflowstoassessperformanceandsuitabilityofgermlinevariantcallingpipelinesinclinicaldiagnosticassays AT dattasomalee benchmarkingworkflowstoassessperformanceandsuitabilityofgermlinevariantcallingpipelinesinclinicaldiagnosticassays AT snydermichaelp benchmarkingworkflowstoassessperformanceandsuitabilityofgermlinevariantcallingpipelinesinclinicaldiagnosticassays AT ashleyeuana benchmarkingworkflowstoassessperformanceandsuitabilityofgermlinevariantcallingpipelinesinclinicaldiagnosticassays

Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays

Ejemplares similares