Cargando…

Machine-learning of complex evolutionary signals improves classification of SNVs

Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity...

Descripción completa

Detalles Bibliográficos
Autores principales: Labes, Sapir, Stupp, Doron, Wagner, Naama, Bloch, Idit, Lotem, Michal, L. Lahad, Ephrat, Polak, Paz, Pupko, Tal, Tabach, Yuval
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8988715/
https://www.ncbi.nlm.nih.gov/pubmed/35402908
http://dx.doi.org/10.1093/nargab/lqac025
_version_ 1784683024536829952
author Labes, Sapir
Stupp, Doron
Wagner, Naama
Bloch, Idit
Lotem, Michal
L. Lahad, Ephrat
Polak, Paz
Pupko, Tal
Tabach, Yuval
author_facet Labes, Sapir
Stupp, Doron
Wagner, Naama
Bloch, Idit
Lotem, Michal
L. Lahad, Ephrat
Polak, Paz
Pupko, Tal
Tabach, Yuval
author_sort Labes, Sapir
collection PubMed
description Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.
format Online
Article
Text
id pubmed-8988715
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-89887152022-04-08 Machine-learning of complex evolutionary signals improves classification of SNVs Labes, Sapir Stupp, Doron Wagner, Naama Bloch, Idit Lotem, Michal L. Lahad, Ephrat Polak, Paz Pupko, Tal Tabach, Yuval NAR Genom Bioinform Methods Article Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity. Oxford University Press 2022-04-07 /pmc/articles/PMC8988715/ /pubmed/35402908 http://dx.doi.org/10.1093/nargab/lqac025 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Article
Labes, Sapir
Stupp, Doron
Wagner, Naama
Bloch, Idit
Lotem, Michal
L. Lahad, Ephrat
Polak, Paz
Pupko, Tal
Tabach, Yuval
Machine-learning of complex evolutionary signals improves classification of SNVs
title Machine-learning of complex evolutionary signals improves classification of SNVs
title_full Machine-learning of complex evolutionary signals improves classification of SNVs
title_fullStr Machine-learning of complex evolutionary signals improves classification of SNVs
title_full_unstemmed Machine-learning of complex evolutionary signals improves classification of SNVs
title_short Machine-learning of complex evolutionary signals improves classification of SNVs
title_sort machine-learning of complex evolutionary signals improves classification of snvs
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8988715/
https://www.ncbi.nlm.nih.gov/pubmed/35402908
http://dx.doi.org/10.1093/nargab/lqac025
work_keys_str_mv AT labessapir machinelearningofcomplexevolutionarysignalsimprovesclassificationofsnvs
AT stuppdoron machinelearningofcomplexevolutionarysignalsimprovesclassificationofsnvs
AT wagnernaama machinelearningofcomplexevolutionarysignalsimprovesclassificationofsnvs
AT blochidit machinelearningofcomplexevolutionarysignalsimprovesclassificationofsnvs
AT lotemmichal machinelearningofcomplexevolutionarysignalsimprovesclassificationofsnvs
AT llahadephrat machinelearningofcomplexevolutionarysignalsimprovesclassificationofsnvs
AT polakpaz machinelearningofcomplexevolutionarysignalsimprovesclassificationofsnvs
AT pupkotal machinelearningofcomplexevolutionarysignalsimprovesclassificationofsnvs
AT tabachyuval machinelearningofcomplexevolutionarysignalsimprovesclassificationofsnvs