Cargando…

Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia

High-risk pediatric B-ALL patients experience 5-year negative event rates up to 25%. Although some biomarkers of relapse are utilized in the clinic, their ability to predict outcomes in high-risk patients is limited. Here, we propose a random survival forest (RSF) machine learning model utilizing in...

Descripción completa

Detalles Bibliográficos
Autores principales: Bohannan, Zachary S., Coffman, Frederick, Mitrofanova, Antonina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8777142/
https://www.ncbi.nlm.nih.gov/pubmed/35116134
http://dx.doi.org/10.1016/j.csbj.2022.01.003
_version_ 1784636999269875712
author Bohannan, Zachary S.
Coffman, Frederick
Mitrofanova, Antonina
author_facet Bohannan, Zachary S.
Coffman, Frederick
Mitrofanova, Antonina
author_sort Bohannan, Zachary S.
collection PubMed
description High-risk pediatric B-ALL patients experience 5-year negative event rates up to 25%. Although some biomarkers of relapse are utilized in the clinic, their ability to predict outcomes in high-risk patients is limited. Here, we propose a random survival forest (RSF) machine learning model utilizing interpretable genomic inputs to predict relapse/death in high-risk pediatric B-ALL patients. We utilized whole exome sequencing profiles from 156 patients in the TARGET-ALL study (with samples collected at presentation) further stratified into training and test cohorts (109 and 47 patients, respectively). To avoid overfitting and facilitate the interpretation of machine learning results, input genomic variables were engineered using a stepwise approach involving univariable Cox models to select variables directly associated with outcomes, genomic coordinate-based analysis to select mutational hotspots, and correlation analysis to eliminate feature co-linearity. Model training identified 7 genomic regions most predictive of relapse/death-free survival. The test cohort error rate was 12.47%, and a polygenic score based on the sum of the top 7 variables effectively stratified patients into two groups, with significant differences in time to relapse/death (log-rank P = 0.001, hazard ratio = 5.41). Our model outperformed other EFS modeling approaches including an RSF using gold-standard prognostic variables (error rate = 24.35%). Validation in 174 standard-risk patients and 3 patients who failed to respond to induction therapy confirmed that our RSF model and polygenic score were specific to high-risk disease. We propose that our feature selection/engineering approach can increase the clinical interpretability of RSF, and our polygenic score could be utilized for enhance clinical decision-making in high-risk B-ALL.
format Online
Article
Text
id pubmed-8777142
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-87771422022-02-02 Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia Bohannan, Zachary S. Coffman, Frederick Mitrofanova, Antonina Comput Struct Biotechnol J Research Article High-risk pediatric B-ALL patients experience 5-year negative event rates up to 25%. Although some biomarkers of relapse are utilized in the clinic, their ability to predict outcomes in high-risk patients is limited. Here, we propose a random survival forest (RSF) machine learning model utilizing interpretable genomic inputs to predict relapse/death in high-risk pediatric B-ALL patients. We utilized whole exome sequencing profiles from 156 patients in the TARGET-ALL study (with samples collected at presentation) further stratified into training and test cohorts (109 and 47 patients, respectively). To avoid overfitting and facilitate the interpretation of machine learning results, input genomic variables were engineered using a stepwise approach involving univariable Cox models to select variables directly associated with outcomes, genomic coordinate-based analysis to select mutational hotspots, and correlation analysis to eliminate feature co-linearity. Model training identified 7 genomic regions most predictive of relapse/death-free survival. The test cohort error rate was 12.47%, and a polygenic score based on the sum of the top 7 variables effectively stratified patients into two groups, with significant differences in time to relapse/death (log-rank P = 0.001, hazard ratio = 5.41). Our model outperformed other EFS modeling approaches including an RSF using gold-standard prognostic variables (error rate = 24.35%). Validation in 174 standard-risk patients and 3 patients who failed to respond to induction therapy confirmed that our RSF model and polygenic score were specific to high-risk disease. We propose that our feature selection/engineering approach can increase the clinical interpretability of RSF, and our polygenic score could be utilized for enhance clinical decision-making in high-risk B-ALL. Research Network of Computational and Structural Biotechnology 2022-01-06 /pmc/articles/PMC8777142/ /pubmed/35116134 http://dx.doi.org/10.1016/j.csbj.2022.01.003 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Bohannan, Zachary S.
Coffman, Frederick
Mitrofanova, Antonina
Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia
title Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia
title_full Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia
title_fullStr Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia
title_full_unstemmed Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia
title_short Random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia
title_sort random survival forest model identifies novel biomarkers of event-free survival in high-risk pediatric acute lymphoblastic leukemia
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8777142/
https://www.ncbi.nlm.nih.gov/pubmed/35116134
http://dx.doi.org/10.1016/j.csbj.2022.01.003
work_keys_str_mv AT bohannanzacharys randomsurvivalforestmodelidentifiesnovelbiomarkersofeventfreesurvivalinhighriskpediatricacutelymphoblasticleukemia
AT coffmanfrederick randomsurvivalforestmodelidentifiesnovelbiomarkersofeventfreesurvivalinhighriskpediatricacutelymphoblasticleukemia
AT mitrofanovaantonina randomsurvivalforestmodelidentifiesnovelbiomarkersofeventfreesurvivalinhighriskpediatricacutelymphoblasticleukemia