Cargando…

Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics

BACKGROUND: Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance m...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahmood, Khalid, Jung, Chol-hee, Philip, Gayle, Georgeson, Peter, Chung, Jessica, Pope, Bernard J., Park, Daniel J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5433009/
https://www.ncbi.nlm.nih.gov/pubmed/28511696
http://dx.doi.org/10.1186/s40246-017-0104-8
_version_ 1783236757928542208
author Mahmood, Khalid
Jung, Chol-hee
Philip, Gayle
Georgeson, Peter
Chung, Jessica
Pope, Bernard J.
Park, Daniel J.
author_facet Mahmood, Khalid
Jung, Chol-hee
Philip, Gayle
Georgeson, Peter
Chung, Jessica
Pope, Bernard J.
Park, Daniel J.
author_sort Mahmood, Khalid
collection PubMed
description BACKGROUND: Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. RESULTS: Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. CONCLUSIONS: These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40246-017-0104-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5433009
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54330092017-05-17 Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics Mahmood, Khalid Jung, Chol-hee Philip, Gayle Georgeson, Peter Chung, Jessica Pope, Bernard J. Park, Daniel J. Hum Genomics Primary Research BACKGROUND: Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. RESULTS: Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. CONCLUSIONS: These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40246-017-0104-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-16 /pmc/articles/PMC5433009/ /pubmed/28511696 http://dx.doi.org/10.1186/s40246-017-0104-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Primary Research
Mahmood, Khalid
Jung, Chol-hee
Philip, Gayle
Georgeson, Peter
Chung, Jessica
Pope, Bernard J.
Park, Daniel J.
Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics
title Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics
title_full Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics
title_fullStr Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics
title_full_unstemmed Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics
title_short Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics
title_sort variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics
topic Primary Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5433009/
https://www.ncbi.nlm.nih.gov/pubmed/28511696
http://dx.doi.org/10.1186/s40246-017-0104-8
work_keys_str_mv AT mahmoodkhalid varianteffectpredictiontoolsassessedusingindependentfunctionalassaybaseddatasetsimplicationsfordiscoveryanddiagnostics
AT jungcholhee varianteffectpredictiontoolsassessedusingindependentfunctionalassaybaseddatasetsimplicationsfordiscoveryanddiagnostics
AT philipgayle varianteffectpredictiontoolsassessedusingindependentfunctionalassaybaseddatasetsimplicationsfordiscoveryanddiagnostics
AT georgesonpeter varianteffectpredictiontoolsassessedusingindependentfunctionalassaybaseddatasetsimplicationsfordiscoveryanddiagnostics
AT chungjessica varianteffectpredictiontoolsassessedusingindependentfunctionalassaybaseddatasetsimplicationsfordiscoveryanddiagnostics
AT popebernardj varianteffectpredictiontoolsassessedusingindependentfunctionalassaybaseddatasetsimplicationsfordiscoveryanddiagnostics
AT parkdanielj varianteffectpredictiontoolsassessedusingindependentfunctionalassaybaseddatasetsimplicationsfordiscoveryanddiagnostics