Cargando…

Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank

Machine learning (ML) holds promise for precision psychiatry, but its predictive performance is unclear. We assessed whether ML provided added value over logistic regression for prediction of schizophrenia, and compared models built using polygenic risk scores (PRS) or clinical/demographic factors....

Descripción completa

Detalles Bibliográficos
Autores principales: Bracher-Smith, Matthew, Rees, Elliott, Menzies, Georgina, Walters, James T.R., O'Donovan, Michael C., Owen, Michael J., Kirov, George, Escott-Price, Valentina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Science Publisher B. V 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9399753/
https://www.ncbi.nlm.nih.gov/pubmed/35779327
http://dx.doi.org/10.1016/j.schres.2022.06.006
_version_ 1784772596885094400
author Bracher-Smith, Matthew
Rees, Elliott
Menzies, Georgina
Walters, James T.R.
O'Donovan, Michael C.
Owen, Michael J.
Kirov, George
Escott-Price, Valentina
author_facet Bracher-Smith, Matthew
Rees, Elliott
Menzies, Georgina
Walters, James T.R.
O'Donovan, Michael C.
Owen, Michael J.
Kirov, George
Escott-Price, Valentina
author_sort Bracher-Smith, Matthew
collection PubMed
description Machine learning (ML) holds promise for precision psychiatry, but its predictive performance is unclear. We assessed whether ML provided added value over logistic regression for prediction of schizophrenia, and compared models built using polygenic risk scores (PRS) or clinical/demographic factors. LASSO and ridge-penalised logistic regression, support vector machines (SVM), random forests, boosting, neural networks and stacked models were trained to predict schizophrenia, using PRS for schizophrenia (PRS(SZ)), sex, parental depression, educational attainment, winter birth, handedness and number of siblings as predictors. Models were evaluated for discrimination using area under the receiver operator characteristic curve (AUROC) and relative importance of predictors using permutation feature importance (PFI). In a secondary analysis, fitted models were tested for association with schizophrenia-related traits which had not been used in model development. Following learning curve analysis, 738 cases and 3690 randomly sampled controls were selected from the UK Biobank. ML models combining all predictors showed the highest discrimination (linear SVM, AUROC = 0.71), but did not significantly outperform logistic regression. AUROC was robust over 100 random resamples of controls. PFI identified PRS(SZ) as the most important predictor. Highest variance in fitted models was explained by schizophrenia-related traits including fluid intelligence (most associated: linear SVM), digit symbol substitution (RBF SVM), BMI (XGBoost), smoking status (XGBoost) and deprivation (linear SVM). In conclusion, ML approaches did not provide substantial added value for prediction of schizophrenia over logistic regression, as indexed by AUROC; however, risk scores derived with different ML approaches differ with respect to association with schizophrenia-related traits.
format Online
Article
Text
id pubmed-9399753
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier Science Publisher B. V
record_format MEDLINE/PubMed
spelling pubmed-93997532022-08-25 Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank Bracher-Smith, Matthew Rees, Elliott Menzies, Georgina Walters, James T.R. O'Donovan, Michael C. Owen, Michael J. Kirov, George Escott-Price, Valentina Schizophr Res Article Machine learning (ML) holds promise for precision psychiatry, but its predictive performance is unclear. We assessed whether ML provided added value over logistic regression for prediction of schizophrenia, and compared models built using polygenic risk scores (PRS) or clinical/demographic factors. LASSO and ridge-penalised logistic regression, support vector machines (SVM), random forests, boosting, neural networks and stacked models were trained to predict schizophrenia, using PRS for schizophrenia (PRS(SZ)), sex, parental depression, educational attainment, winter birth, handedness and number of siblings as predictors. Models were evaluated for discrimination using area under the receiver operator characteristic curve (AUROC) and relative importance of predictors using permutation feature importance (PFI). In a secondary analysis, fitted models were tested for association with schizophrenia-related traits which had not been used in model development. Following learning curve analysis, 738 cases and 3690 randomly sampled controls were selected from the UK Biobank. ML models combining all predictors showed the highest discrimination (linear SVM, AUROC = 0.71), but did not significantly outperform logistic regression. AUROC was robust over 100 random resamples of controls. PFI identified PRS(SZ) as the most important predictor. Highest variance in fitted models was explained by schizophrenia-related traits including fluid intelligence (most associated: linear SVM), digit symbol substitution (RBF SVM), BMI (XGBoost), smoking status (XGBoost) and deprivation (linear SVM). In conclusion, ML approaches did not provide substantial added value for prediction of schizophrenia over logistic regression, as indexed by AUROC; however, risk scores derived with different ML approaches differ with respect to association with schizophrenia-related traits. Elsevier Science Publisher B. V 2022-08 /pmc/articles/PMC9399753/ /pubmed/35779327 http://dx.doi.org/10.1016/j.schres.2022.06.006 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Bracher-Smith, Matthew
Rees, Elliott
Menzies, Georgina
Walters, James T.R.
O'Donovan, Michael C.
Owen, Michael J.
Kirov, George
Escott-Price, Valentina
Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank
title Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank
title_full Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank
title_fullStr Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank
title_full_unstemmed Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank
title_short Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank
title_sort machine learning for prediction of schizophrenia using genetic and demographic factors in the uk biobank
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9399753/
https://www.ncbi.nlm.nih.gov/pubmed/35779327
http://dx.doi.org/10.1016/j.schres.2022.06.006
work_keys_str_mv AT brachersmithmatthew machinelearningforpredictionofschizophreniausinggeneticanddemographicfactorsintheukbiobank
AT reeselliott machinelearningforpredictionofschizophreniausinggeneticanddemographicfactorsintheukbiobank
AT menziesgeorgina machinelearningforpredictionofschizophreniausinggeneticanddemographicfactorsintheukbiobank
AT waltersjamestr machinelearningforpredictionofschizophreniausinggeneticanddemographicfactorsintheukbiobank
AT odonovanmichaelc machinelearningforpredictionofschizophreniausinggeneticanddemographicfactorsintheukbiobank
AT owenmichaelj machinelearningforpredictionofschizophreniausinggeneticanddemographicfactorsintheukbiobank
AT kirovgeorge machinelearningforpredictionofschizophreniausinggeneticanddemographicfactorsintheukbiobank
AT escottpricevalentina machinelearningforpredictionofschizophreniausinggeneticanddemographicfactorsintheukbiobank