Cargando…

Family-specific analysis of variant pathogenicity prediction tools

Using the presently available datasets of annotated missense variants, we ran a protein family-specific benchmarking of tools for predicting the pathogenicity of single amino acid variants. We find that despite the high overall accuracy of all tested methods, each tool has its Achilles heel, i.e. pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Zaucha, Jan, Heinzinger, Michael, Tarnovskaya, Svetlana, Rost, Burkhard, Frishman, Dmitrij
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671395/
https://www.ncbi.nlm.nih.gov/pubmed/33575576
http://dx.doi.org/10.1093/nargab/lqaa014
_version_ 1783610920962883584
author Zaucha, Jan
Heinzinger, Michael
Tarnovskaya, Svetlana
Rost, Burkhard
Frishman, Dmitrij
author_facet Zaucha, Jan
Heinzinger, Michael
Tarnovskaya, Svetlana
Rost, Burkhard
Frishman, Dmitrij
author_sort Zaucha, Jan
collection PubMed
description Using the presently available datasets of annotated missense variants, we ran a protein family-specific benchmarking of tools for predicting the pathogenicity of single amino acid variants. We find that despite the high overall accuracy of all tested methods, each tool has its Achilles heel, i.e. protein families in which its predictions prove unreliable (expected accuracy does not exceed 51% in any method). As a proof of principle, we show that choosing the optimal tool and pathogenicity threshold at a protein family-individual level allows obtaining reliable predictions in all Pfam domains (accuracy no less than 68%). A functional analysis of the sets of protein domains annotated exclusively by neutral or pathogenic mutations indicates that specific protein functions can be associated with a high or low sensitivity to mutations, respectively. The highly sensitive sets of protein domains are involved in the regulation of transcription and DNA sequence-specific transcription factor binding, while the domains that do not result in disease when mutated are responsible for mediating immune and stress responses. These results suggest that future predictors of pathogenicity and especially variant prioritization tools may benefit from considering functional annotation.
format Online
Article
Text
id pubmed-7671395
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76713952021-02-10 Family-specific analysis of variant pathogenicity prediction tools Zaucha, Jan Heinzinger, Michael Tarnovskaya, Svetlana Rost, Burkhard Frishman, Dmitrij NAR Genom Bioinform Methart Using the presently available datasets of annotated missense variants, we ran a protein family-specific benchmarking of tools for predicting the pathogenicity of single amino acid variants. We find that despite the high overall accuracy of all tested methods, each tool has its Achilles heel, i.e. protein families in which its predictions prove unreliable (expected accuracy does not exceed 51% in any method). As a proof of principle, we show that choosing the optimal tool and pathogenicity threshold at a protein family-individual level allows obtaining reliable predictions in all Pfam domains (accuracy no less than 68%). A functional analysis of the sets of protein domains annotated exclusively by neutral or pathogenic mutations indicates that specific protein functions can be associated with a high or low sensitivity to mutations, respectively. The highly sensitive sets of protein domains are involved in the regulation of transcription and DNA sequence-specific transcription factor binding, while the domains that do not result in disease when mutated are responsible for mediating immune and stress responses. These results suggest that future predictors of pathogenicity and especially variant prioritization tools may benefit from considering functional annotation. Oxford University Press 2020-02-28 /pmc/articles/PMC7671395/ /pubmed/33575576 http://dx.doi.org/10.1093/nargab/lqaa014 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methart
Zaucha, Jan
Heinzinger, Michael
Tarnovskaya, Svetlana
Rost, Burkhard
Frishman, Dmitrij
Family-specific analysis of variant pathogenicity prediction tools
title Family-specific analysis of variant pathogenicity prediction tools
title_full Family-specific analysis of variant pathogenicity prediction tools
title_fullStr Family-specific analysis of variant pathogenicity prediction tools
title_full_unstemmed Family-specific analysis of variant pathogenicity prediction tools
title_short Family-specific analysis of variant pathogenicity prediction tools
title_sort family-specific analysis of variant pathogenicity prediction tools
topic Methart
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671395/
https://www.ncbi.nlm.nih.gov/pubmed/33575576
http://dx.doi.org/10.1093/nargab/lqaa014
work_keys_str_mv AT zauchajan familyspecificanalysisofvariantpathogenicitypredictiontools
AT heinzingermichael familyspecificanalysisofvariantpathogenicitypredictiontools
AT tarnovskayasvetlana familyspecificanalysisofvariantpathogenicitypredictiontools
AT rostburkhard familyspecificanalysisofvariantpathogenicitypredictiontools
AT frishmandmitrij familyspecificanalysisofvariantpathogenicitypredictiontools