Cargando…

Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies

Many automatic classifiers were introduced to aid inference of phenotypical effects of uncategorised nsSNVs (nonsynonymous Single Nucleotide Variations) in theoretical and medical applications. Lately, several meta-estimators have been proposed that combine different predictors, such as PolyPhen and...

Descripción completa

Detalles Bibliográficos
Autores principales: Korvigo, Ilia, Afanasyev, Andrey, Romashchenko, Nikolay, Skoblov, Mikhail
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5851551/
https://www.ncbi.nlm.nih.gov/pubmed/29538399
http://dx.doi.org/10.1371/journal.pone.0192829
_version_ 1783306405414961152
author Korvigo, Ilia
Afanasyev, Andrey
Romashchenko, Nikolay
Skoblov, Mikhail
author_facet Korvigo, Ilia
Afanasyev, Andrey
Romashchenko, Nikolay
Skoblov, Mikhail
author_sort Korvigo, Ilia
collection PubMed
description Many automatic classifiers were introduced to aid inference of phenotypical effects of uncategorised nsSNVs (nonsynonymous Single Nucleotide Variations) in theoretical and medical applications. Lately, several meta-estimators have been proposed that combine different predictors, such as PolyPhen and SIFT, to integrate more information in a single score. Although many advances have been made in feature design and machine learning algorithms used, the shortage of high-quality reference data along with the bias towards intensively studied in vitro models call for improved generalisation ability in order to further increase classification accuracy and handle records with insufficient data. Since a meta-estimator basically combines different scoring systems with highly complicated nonlinear relationships, we investigated how deep learning (supervised and unsupervised), which is particularly efficient at discovering hierarchies of features, can improve classification performance. While it is believed that one should only use deep learning for high-dimensional input spaces and other models (logistic regression, support vector machines, Bayesian classifiers, etc) for simpler inputs, we still believe that the ability of neural networks to discover intricate structure in highly heterogenous datasets can aid a meta-estimator. We compare the performance with various popular predictors, many of which are recommended by the American College of Medical Genetics and Genomics (ACMG), as well as available deep learning-based predictors. Thanks to hardware acceleration we were able to use a computationally expensive genetic algorithm to stochastically optimise hyper-parameters over many generations. Overfitting was hindered by noise injection and dropout, limiting coadaptation of hidden units. Although we stress that this work was not conceived as a tool comparison, but rather an exploration of the possibilities of deep learning application in ensemble scores, our results show that even relatively simple modern neural networks can significantly improve both prediction accuracy and coverage. We provide open-access to our finest model via the web-site: http://score.generesearch.ru/services/badmut/.
format Online
Article
Text
id pubmed-5851551
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-58515512018-03-23 Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies Korvigo, Ilia Afanasyev, Andrey Romashchenko, Nikolay Skoblov, Mikhail PLoS One Research Article Many automatic classifiers were introduced to aid inference of phenotypical effects of uncategorised nsSNVs (nonsynonymous Single Nucleotide Variations) in theoretical and medical applications. Lately, several meta-estimators have been proposed that combine different predictors, such as PolyPhen and SIFT, to integrate more information in a single score. Although many advances have been made in feature design and machine learning algorithms used, the shortage of high-quality reference data along with the bias towards intensively studied in vitro models call for improved generalisation ability in order to further increase classification accuracy and handle records with insufficient data. Since a meta-estimator basically combines different scoring systems with highly complicated nonlinear relationships, we investigated how deep learning (supervised and unsupervised), which is particularly efficient at discovering hierarchies of features, can improve classification performance. While it is believed that one should only use deep learning for high-dimensional input spaces and other models (logistic regression, support vector machines, Bayesian classifiers, etc) for simpler inputs, we still believe that the ability of neural networks to discover intricate structure in highly heterogenous datasets can aid a meta-estimator. We compare the performance with various popular predictors, many of which are recommended by the American College of Medical Genetics and Genomics (ACMG), as well as available deep learning-based predictors. Thanks to hardware acceleration we were able to use a computationally expensive genetic algorithm to stochastically optimise hyper-parameters over many generations. Overfitting was hindered by noise injection and dropout, limiting coadaptation of hidden units. Although we stress that this work was not conceived as a tool comparison, but rather an exploration of the possibilities of deep learning application in ensemble scores, our results show that even relatively simple modern neural networks can significantly improve both prediction accuracy and coverage. We provide open-access to our finest model via the web-site: http://score.generesearch.ru/services/badmut/. Public Library of Science 2018-03-14 /pmc/articles/PMC5851551/ /pubmed/29538399 http://dx.doi.org/10.1371/journal.pone.0192829 Text en © 2018 Korvigo et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Korvigo, Ilia
Afanasyev, Andrey
Romashchenko, Nikolay
Skoblov, Mikhail
Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies
title Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies
title_full Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies
title_fullStr Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies
title_full_unstemmed Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies
title_short Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies
title_sort generalising better: applying deep learning to integrate deleteriousness prediction scores for whole-exome snv studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5851551/
https://www.ncbi.nlm.nih.gov/pubmed/29538399
http://dx.doi.org/10.1371/journal.pone.0192829
work_keys_str_mv AT korvigoilia generalisingbetterapplyingdeeplearningtointegratedeleteriousnesspredictionscoresforwholeexomesnvstudies
AT afanasyevandrey generalisingbetterapplyingdeeplearningtointegratedeleteriousnesspredictionscoresforwholeexomesnvstudies
AT romashchenkonikolay generalisingbetterapplyingdeeplearningtointegratedeleteriousnesspredictionscoresforwholeexomesnvstudies
AT skoblovmikhail generalisingbetterapplyingdeeplearningtointegratedeleteriousnesspredictionscoresforwholeexomesnvstudies