Cargando…

Assessing performance of pathogenicity predictors using clinically relevant variant datasets

BACKGROUND: Pathogenicity predictors are integral to genomic variant interpretation but, despite their widespread usage, an independent validation of performance using a clinically relevant dataset has not been undertaken. METHODS: We derive two validation datasets: an ‘open’ dataset containing vari...

Descripción completa

Detalles Bibliográficos
Autores principales: Gunning, Adam C, Fryer, Verity, Fasham, James, Crosby, Andrew H, Ellard, Sian, Baple, Emma L, Wright, Caroline F
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327323/
https://www.ncbi.nlm.nih.gov/pubmed/32843488
http://dx.doi.org/10.1136/jmedgenet-2020-107003
_version_ 1783732051562725376
author Gunning, Adam C
Fryer, Verity
Fasham, James
Crosby, Andrew H
Ellard, Sian
Baple, Emma L
Wright, Caroline F
author_facet Gunning, Adam C
Fryer, Verity
Fasham, James
Crosby, Andrew H
Ellard, Sian
Baple, Emma L
Wright, Caroline F
author_sort Gunning, Adam C
collection PubMed
description BACKGROUND: Pathogenicity predictors are integral to genomic variant interpretation but, despite their widespread usage, an independent validation of performance using a clinically relevant dataset has not been undertaken. METHODS: We derive two validation datasets: an ‘open’ dataset containing variants extracted from publicly available databases, similar to those commonly applied in previous benchmarking exercises, and a ‘clinically representative’ dataset containing variants identified through research/diagnostic exome and panel sequencing. Using these datasets, we evaluate the performance of three recent meta-predictors, REVEL, GAVIN and ClinPred, and compare their performance against two commonly used in silico tools, SIFT and PolyPhen-2. RESULTS: Although the newer meta-predictors outperform the older tools, the performance of all pathogenicity predictors is substantially lower in the clinically representative dataset. Using our clinically relevant dataset, REVEL performed best with an area under the receiver operating characteristic curve of 0.82. Using a concordance-based approach based on a consensus of multiple tools reduces the performance due to both discordance between tools and false concordance where tools make common misclassification. Analysis of tool feature usage may give an insight into the tool performance and misclassification. CONCLUSION: Our results support the adoption of meta-predictors over traditional in silico tools, but do not support a consensus-based approach as in current practice.
format Online
Article
Text
id pubmed-8327323
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-83273232021-08-19 Assessing performance of pathogenicity predictors using clinically relevant variant datasets Gunning, Adam C Fryer, Verity Fasham, James Crosby, Andrew H Ellard, Sian Baple, Emma L Wright, Caroline F J Med Genet Diagnostics BACKGROUND: Pathogenicity predictors are integral to genomic variant interpretation but, despite their widespread usage, an independent validation of performance using a clinically relevant dataset has not been undertaken. METHODS: We derive two validation datasets: an ‘open’ dataset containing variants extracted from publicly available databases, similar to those commonly applied in previous benchmarking exercises, and a ‘clinically representative’ dataset containing variants identified through research/diagnostic exome and panel sequencing. Using these datasets, we evaluate the performance of three recent meta-predictors, REVEL, GAVIN and ClinPred, and compare their performance against two commonly used in silico tools, SIFT and PolyPhen-2. RESULTS: Although the newer meta-predictors outperform the older tools, the performance of all pathogenicity predictors is substantially lower in the clinically representative dataset. Using our clinically relevant dataset, REVEL performed best with an area under the receiver operating characteristic curve of 0.82. Using a concordance-based approach based on a consensus of multiple tools reduces the performance due to both discordance between tools and false concordance where tools make common misclassification. Analysis of tool feature usage may give an insight into the tool performance and misclassification. CONCLUSION: Our results support the adoption of meta-predictors over traditional in silico tools, but do not support a consensus-based approach as in current practice. BMJ Publishing Group 2021-08 2020-08-25 /pmc/articles/PMC8327323/ /pubmed/32843488 http://dx.doi.org/10.1136/jmedgenet-2020-107003 Text en © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY. Published by BMJ. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
spellingShingle Diagnostics
Gunning, Adam C
Fryer, Verity
Fasham, James
Crosby, Andrew H
Ellard, Sian
Baple, Emma L
Wright, Caroline F
Assessing performance of pathogenicity predictors using clinically relevant variant datasets
title Assessing performance of pathogenicity predictors using clinically relevant variant datasets
title_full Assessing performance of pathogenicity predictors using clinically relevant variant datasets
title_fullStr Assessing performance of pathogenicity predictors using clinically relevant variant datasets
title_full_unstemmed Assessing performance of pathogenicity predictors using clinically relevant variant datasets
title_short Assessing performance of pathogenicity predictors using clinically relevant variant datasets
title_sort assessing performance of pathogenicity predictors using clinically relevant variant datasets
topic Diagnostics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8327323/
https://www.ncbi.nlm.nih.gov/pubmed/32843488
http://dx.doi.org/10.1136/jmedgenet-2020-107003
work_keys_str_mv AT gunningadamc assessingperformanceofpathogenicitypredictorsusingclinicallyrelevantvariantdatasets
AT fryerverity assessingperformanceofpathogenicitypredictorsusingclinicallyrelevantvariantdatasets
AT fashamjames assessingperformanceofpathogenicitypredictorsusingclinicallyrelevantvariantdatasets
AT crosbyandrewh assessingperformanceofpathogenicitypredictorsusingclinicallyrelevantvariantdatasets
AT ellardsian assessingperformanceofpathogenicitypredictorsusingclinicallyrelevantvariantdatasets
AT bapleemmal assessingperformanceofpathogenicitypredictorsusingclinicallyrelevantvariantdatasets
AT wrightcarolinef assessingperformanceofpathogenicitypredictorsusingclinicallyrelevantvariantdatasets