Cargando…

The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen‐2, SIFT, FatHMM, Muta...

Descripción completa

Detalles Bibliográficos
Autores principales: Grimm, Dominik G., Azencott, Chloé‐Agathe, Aicheler, Fabian, Gieraths, Udo, MacArthur, Daniel G., Samocha, Kaitlin E., Cooper, David N., Stenson, Peter D., Daly, Mark J., Smoller, Jordan W., Duncan, Laramie E., Borgwardt, Karsten M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4409520/
https://www.ncbi.nlm.nih.gov/pubmed/25684150
http://dx.doi.org/10.1002/humu.22768
_version_ 1782368205157695488
author Grimm, Dominik G.
Azencott, Chloé‐Agathe
Aicheler, Fabian
Gieraths, Udo
MacArthur, Daniel G.
Samocha, Kaitlin E.
Cooper, David N.
Stenson, Peter D.
Daly, Mark J.
Smoller, Jordan W.
Duncan, Laramie E.
Borgwardt, Karsten M.
author_facet Grimm, Dominik G.
Azencott, Chloé‐Agathe
Aicheler, Fabian
Gieraths, Udo
MacArthur, Daniel G.
Samocha, Kaitlin E.
Cooper, David N.
Stenson, Peter D.
Daly, Mark J.
Smoller, Jordan W.
Duncan, Laramie E.
Borgwardt, Karsten M.
author_sort Grimm, Dominik G.
collection PubMed
description Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen‐2, SIFT, FatHMM, MutationTaster‐2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.
format Online
Article
Text
id pubmed-4409520
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-44095202016-05-01 The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity Grimm, Dominik G. Azencott, Chloé‐Agathe Aicheler, Fabian Gieraths, Udo MacArthur, Daniel G. Samocha, Kaitlin E. Cooper, David N. Stenson, Peter D. Daly, Mark J. Smoller, Jordan W. Duncan, Laramie E. Borgwardt, Karsten M. Hum Mutat Research Articles Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen‐2, SIFT, FatHMM, MutationTaster‐2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools. John Wiley and Sons Inc. 2015-03-26 2015-05 /pmc/articles/PMC4409520/ /pubmed/25684150 http://dx.doi.org/10.1002/humu.22768 Text en © 2015 The Authors. **Human Mutation published by Wiley Periodicals, Inc. This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial 4.0 (http://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Research Articles
Grimm, Dominik G.
Azencott, Chloé‐Agathe
Aicheler, Fabian
Gieraths, Udo
MacArthur, Daniel G.
Samocha, Kaitlin E.
Cooper, David N.
Stenson, Peter D.
Daly, Mark J.
Smoller, Jordan W.
Duncan, Laramie E.
Borgwardt, Karsten M.
The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity
title The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity
title_full The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity
title_fullStr The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity
title_full_unstemmed The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity
title_short The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity
title_sort evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4409520/
https://www.ncbi.nlm.nih.gov/pubmed/25684150
http://dx.doi.org/10.1002/humu.22768
work_keys_str_mv AT grimmdominikg theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT azencottchloeagathe theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT aichelerfabian theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT gierathsudo theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT macarthurdanielg theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT samochakaitline theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT cooperdavidn theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT stensonpeterd theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT dalymarkj theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT smollerjordanw theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT duncanlaramiee theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT borgwardtkarstenm theevaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT grimmdominikg evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT azencottchloeagathe evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT aichelerfabian evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT gierathsudo evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT macarthurdanielg evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT samochakaitline evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT cooperdavidn evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT stensonpeterd evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT dalymarkj evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT smollerjordanw evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT duncanlaramiee evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity
AT borgwardtkarstenm evaluationoftoolsusedtopredicttheimpactofmissensevariantsishinderedbytwotypesofcircularity