Cargando…

Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data

BACKGROUND: Treating cancer depends in part on identifying the mutations driving each patient’s disease. Many clinical laboratories are adopting high-throughput sequencing for assaying patients’ tumours, applying targeted panels to formalin-fixed paraffin-embedded tumour tissues to detect clinically...

Descripción completa

Detalles Bibliográficos
Autores principales: Karimnezhad, Ali, Palidwor, Gareth A., Thavorn, Kednapa, Stewart, David J., Campbell, Pearl A., Lo, Bryan, Perkins, Theodore J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7560075/
https://www.ncbi.nlm.nih.gov/pubmed/33059707
http://dx.doi.org/10.1186/s12920-020-00803-z
_version_ 1783595006312841216
author Karimnezhad, Ali
Palidwor, Gareth A.
Thavorn, Kednapa
Stewart, David J.
Campbell, Pearl A.
Lo, Bryan
Perkins, Theodore J.
author_facet Karimnezhad, Ali
Palidwor, Gareth A.
Thavorn, Kednapa
Stewart, David J.
Campbell, Pearl A.
Lo, Bryan
Perkins, Theodore J.
author_sort Karimnezhad, Ali
collection PubMed
description BACKGROUND: Treating cancer depends in part on identifying the mutations driving each patient’s disease. Many clinical laboratories are adopting high-throughput sequencing for assaying patients’ tumours, applying targeted panels to formalin-fixed paraffin-embedded tumour tissues to detect clinically-relevant mutations. While there have been some benchmarking and best practices studies of this scenario, much variant calling work focuses on whole-genome or whole-exome studies, with fresh or fresh-frozen tissue. Thus, definitive guidance on best choices for sequencing platforms, sequencing strategies, and variant calling for clinical variant detection is still being developed. METHODS: Because ground truth for clinical specimens is rarely known, we used the well-characterized Coriell cell lines GM12878 and GM12877 to generate data. We prepared samples to mimic as closely as possible clinical biopsies, including formalin fixation and paraffin embedding. We evaluated two well-known targeted sequencing panels, Illumina’s TruSight 170 hybrid-capture panel and the amplification-based Oncomine Focus panel. Sequencing was performed on an Illumina NextSeq500 and an Ion Torrent PGM respectively. We performed multiple replicates of each assay, to test reproducibility. Finally, we applied four different freely-available somatic single-nucleotide variant (SNV) callers to the data, along with the vendor-recommended callers for each sequencing platform. RESULTS: We did not observe major differences in variant calling success within the regions that each panel covers, but there were substantial differences between callers. All had high sensitivity for true SNVs, but numerous and non-overlapping false positives. Overriding certain default parameters to make them consistent between callers substantially reduced discrepancies, but still resulted in high false positive rates. Intersecting results from multiple replicates or from different variant callers eliminated most false positives, while maintaining sensitivity. CONCLUSIONS: Reproducibility and accuracy of targeted clinical sequencing results depend less on sequencing platform and panel than on variability between replicates and downstream bioinformatics. Differences in variant callers’ default parameters are a greater influence on algorithm disagreement than other differences between the algorithms. Contrary to typical clinical practice, we recommend employing multiple variant calling pipelines and/or analyzing replicate samples, as this greatly decreases false positive calls.
format Online
Article
Text
id pubmed-7560075
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-75600752020-10-16 Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data Karimnezhad, Ali Palidwor, Gareth A. Thavorn, Kednapa Stewart, David J. Campbell, Pearl A. Lo, Bryan Perkins, Theodore J. BMC Med Genomics Research Article BACKGROUND: Treating cancer depends in part on identifying the mutations driving each patient’s disease. Many clinical laboratories are adopting high-throughput sequencing for assaying patients’ tumours, applying targeted panels to formalin-fixed paraffin-embedded tumour tissues to detect clinically-relevant mutations. While there have been some benchmarking and best practices studies of this scenario, much variant calling work focuses on whole-genome or whole-exome studies, with fresh or fresh-frozen tissue. Thus, definitive guidance on best choices for sequencing platforms, sequencing strategies, and variant calling for clinical variant detection is still being developed. METHODS: Because ground truth for clinical specimens is rarely known, we used the well-characterized Coriell cell lines GM12878 and GM12877 to generate data. We prepared samples to mimic as closely as possible clinical biopsies, including formalin fixation and paraffin embedding. We evaluated two well-known targeted sequencing panels, Illumina’s TruSight 170 hybrid-capture panel and the amplification-based Oncomine Focus panel. Sequencing was performed on an Illumina NextSeq500 and an Ion Torrent PGM respectively. We performed multiple replicates of each assay, to test reproducibility. Finally, we applied four different freely-available somatic single-nucleotide variant (SNV) callers to the data, along with the vendor-recommended callers for each sequencing platform. RESULTS: We did not observe major differences in variant calling success within the regions that each panel covers, but there were substantial differences between callers. All had high sensitivity for true SNVs, but numerous and non-overlapping false positives. Overriding certain default parameters to make them consistent between callers substantially reduced discrepancies, but still resulted in high false positive rates. Intersecting results from multiple replicates or from different variant callers eliminated most false positives, while maintaining sensitivity. CONCLUSIONS: Reproducibility and accuracy of targeted clinical sequencing results depend less on sequencing platform and panel than on variability between replicates and downstream bioinformatics. Differences in variant callers’ default parameters are a greater influence on algorithm disagreement than other differences between the algorithms. Contrary to typical clinical practice, we recommend employing multiple variant calling pipelines and/or analyzing replicate samples, as this greatly decreases false positive calls. BioMed Central 2020-10-15 /pmc/articles/PMC7560075/ /pubmed/33059707 http://dx.doi.org/10.1186/s12920-020-00803-z Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Karimnezhad, Ali
Palidwor, Gareth A.
Thavorn, Kednapa
Stewart, David J.
Campbell, Pearl A.
Lo, Bryan
Perkins, Theodore J.
Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data
title Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data
title_full Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data
title_fullStr Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data
title_full_unstemmed Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data
title_short Accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data
title_sort accuracy and reproducibility of somatic point mutation calling in clinical-type targeted sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7560075/
https://www.ncbi.nlm.nih.gov/pubmed/33059707
http://dx.doi.org/10.1186/s12920-020-00803-z
work_keys_str_mv AT karimnezhadali accuracyandreproducibilityofsomaticpointmutationcallinginclinicaltypetargetedsequencingdata
AT palidworgaretha accuracyandreproducibilityofsomaticpointmutationcallinginclinicaltypetargetedsequencingdata
AT thavornkednapa accuracyandreproducibilityofsomaticpointmutationcallinginclinicaltypetargetedsequencingdata
AT stewartdavidj accuracyandreproducibilityofsomaticpointmutationcallinginclinicaltypetargetedsequencingdata
AT campbellpearla accuracyandreproducibilityofsomaticpointmutationcallinginclinicaltypetargetedsequencingdata
AT lobryan accuracyandreproducibilityofsomaticpointmutationcallinginclinicaltypetargetedsequencingdata
AT perkinstheodorej accuracyandreproducibilityofsomaticpointmutationcallinginclinicaltypetargetedsequencingdata