Cargando…

Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis

BACKGROUND AND OBJECTIVES: As the number of repeats in the expansion increases, polyglutamine diseases tend to show at a younger age. From this relationship, attempts have been made to predict age at onset by parametric survival analysis. However, a method for a more accurate prediction has been des...

Descripción completa

Detalles Bibliográficos
Autores principales: Hatano, Yuya, Ishihara, Tomohiko, Hirokawa, Sachiko, Onodera, Osamu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Wolters Kluwer 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10159758/
https://www.ncbi.nlm.nih.gov/pubmed/37152445
http://dx.doi.org/10.1212/NXG.0000000000200075
_version_ 1785037168385720320
author Hatano, Yuya
Ishihara, Tomohiko
Hirokawa, Sachiko
Onodera, Osamu
author_facet Hatano, Yuya
Ishihara, Tomohiko
Hirokawa, Sachiko
Onodera, Osamu
author_sort Hatano, Yuya
collection PubMed
description BACKGROUND AND OBJECTIVES: As the number of repeats in the expansion increases, polyglutamine diseases tend to show at a younger age. From this relationship, attempts have been made to predict age at onset by parametric survival analysis. However, a method for a more accurate prediction has been desirable. In this study, we examined 2 methods for survival analysis using machine learning and 6 conventional methods for parametric survival analysis of spinocerebellar ataxia (SCA)3 and dentatorubral-pallidoluysian atrophy (DRPLA). METHODS: We compared the performance of 2 machine learning methods of survival analysis (random survival forest [RSF] and DeepSurv) and 6 methods of parametric survival analysis (Weibull, exponential, Gaussian, logistic, loglogistic, and log Gaussian). Training and evaluation were performed using the leave-one-out cross-validation method, and evaluation criteria included root mean squared error (RMSE), mean absolute error (MAE), and the integrated Brier score. The latter was used as the primary end point, and the survival analysis model yielding the best result was used to predict the asymptomatic probability. RESULTS: Among the models examined, the RSF and DeepSurv machine learning methods had a higher prediction accuracy than the parametric methods of survival analysis. For both SCA3 and DRPLA, RSF had a higher accuracy than DeepSurv for the assessment of RMSE (SCA3: 7.37, DRPLA: 10.78), MAE (SCA3: 5.52, DRPLA: 8.17), and the integrated Brier score (SCA3: 0.05, DRPLA: 0.077). Using RSF, we determined the age-specific probability distribution of age at onset based on CAG repeat size and current age. DISCUSSION: In this study, we have demonstrated the superiority of machine learning methods for predicting age at onset of SCA3 and DRPLA using survival analysis. Such accurate prediction of onset will be useful for genetic counseling of carriers and for devising methods to verify the effects of interventions for unaffected individuals.
format Online
Article
Text
id pubmed-10159758
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Wolters Kluwer
record_format MEDLINE/PubMed
spelling pubmed-101597582023-05-05 Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis Hatano, Yuya Ishihara, Tomohiko Hirokawa, Sachiko Onodera, Osamu Neurol Genet Research Article BACKGROUND AND OBJECTIVES: As the number of repeats in the expansion increases, polyglutamine diseases tend to show at a younger age. From this relationship, attempts have been made to predict age at onset by parametric survival analysis. However, a method for a more accurate prediction has been desirable. In this study, we examined 2 methods for survival analysis using machine learning and 6 conventional methods for parametric survival analysis of spinocerebellar ataxia (SCA)3 and dentatorubral-pallidoluysian atrophy (DRPLA). METHODS: We compared the performance of 2 machine learning methods of survival analysis (random survival forest [RSF] and DeepSurv) and 6 methods of parametric survival analysis (Weibull, exponential, Gaussian, logistic, loglogistic, and log Gaussian). Training and evaluation were performed using the leave-one-out cross-validation method, and evaluation criteria included root mean squared error (RMSE), mean absolute error (MAE), and the integrated Brier score. The latter was used as the primary end point, and the survival analysis model yielding the best result was used to predict the asymptomatic probability. RESULTS: Among the models examined, the RSF and DeepSurv machine learning methods had a higher prediction accuracy than the parametric methods of survival analysis. For both SCA3 and DRPLA, RSF had a higher accuracy than DeepSurv for the assessment of RMSE (SCA3: 7.37, DRPLA: 10.78), MAE (SCA3: 5.52, DRPLA: 8.17), and the integrated Brier score (SCA3: 0.05, DRPLA: 0.077). Using RSF, we determined the age-specific probability distribution of age at onset based on CAG repeat size and current age. DISCUSSION: In this study, we have demonstrated the superiority of machine learning methods for predicting age at onset of SCA3 and DRPLA using survival analysis. Such accurate prediction of onset will be useful for genetic counseling of carriers and for devising methods to verify the effects of interventions for unaffected individuals. Wolters Kluwer 2023-05-04 /pmc/articles/PMC10159758/ /pubmed/37152445 http://dx.doi.org/10.1212/NXG.0000000000200075 Text en Copyright © 2023 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
spellingShingle Research Article
Hatano, Yuya
Ishihara, Tomohiko
Hirokawa, Sachiko
Onodera, Osamu
Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis
title Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis
title_full Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis
title_fullStr Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis
title_full_unstemmed Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis
title_short Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis
title_sort machine learning approach for the prediction of age-specific probability of sca3 and drpla by survival curve analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10159758/
https://www.ncbi.nlm.nih.gov/pubmed/37152445
http://dx.doi.org/10.1212/NXG.0000000000200075
work_keys_str_mv AT hatanoyuya machinelearningapproachforthepredictionofagespecificprobabilityofsca3anddrplabysurvivalcurveanalysis
AT ishiharatomohiko machinelearningapproachforthepredictionofagespecificprobabilityofsca3anddrplabysurvivalcurveanalysis
AT hirokawasachiko machinelearningapproachforthepredictionofagespecificprobabilityofsca3anddrplabysurvivalcurveanalysis
AT onoderaosamu machinelearningapproachforthepredictionofagespecificprobabilityofsca3anddrplabysurvivalcurveanalysis