Cargando…

An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients

BACKGROUND: Lung cancer is the leading cause of cancer death worldwide. Prognostic prediction plays a vital role in the decision-making process for postoperative non-small cell lung cancer (NSCLC) patients. However, the high imbalance ratio of prognostic data limits the development of effective prog...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Danqing, Zhang, Huanyao, Li, Shaolei, Duan, Huilong, Wu, Nan, Lu, Xudong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487160/
https://www.ncbi.nlm.nih.gov/pubmed/36123745
http://dx.doi.org/10.1186/s12911-022-01960-0
_version_ 1784792434951061504
author Hu, Danqing
Zhang, Huanyao
Li, Shaolei
Duan, Huilong
Wu, Nan
Lu, Xudong
author_facet Hu, Danqing
Zhang, Huanyao
Li, Shaolei
Duan, Huilong
Wu, Nan
Lu, Xudong
author_sort Hu, Danqing
collection PubMed
description BACKGROUND: Lung cancer is the leading cause of cancer death worldwide. Prognostic prediction plays a vital role in the decision-making process for postoperative non-small cell lung cancer (NSCLC) patients. However, the high imbalance ratio of prognostic data limits the development of effective prognostic prediction models. METHODS: In this study, we present a novel approach, namely ensemble learning with active sampling (ELAS), to tackle the imbalanced data problem in NSCLC prognostic prediction. ELAS first applies an active sampling mechanism to query the most informative samples to update the base classifier to give it a new perspective. This training process is repeated until no enough samples are queried. Next, an internal validation set is employed to evaluate the base classifiers, and the ones with the best performances are integrated as the ensemble model. Besides, we set up multiple initial training data seeds and internal validation sets to ensure the stability and generalization of the model. RESULTS: We verified the effectiveness of the ELAS on a real clinical dataset containing 1848 postoperative NSCLC patients. Experimental results showed that the ELAS achieved the best averaged 0.736 AUROC value and 0.453 AUPRC value for 6 prognostic tasks and obtained significant improvements in comparison with the SVM, AdaBoost, Bagging, SMOTE and TomekLinks. CONCLUSIONS: We conclude that the ELAS can effectively alleviate the imbalanced data problem in NSCLC prognostic prediction and demonstrates good potential for future postoperative NSCLC prognostic prediction. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-022-01960-0.
format Online
Article
Text
id pubmed-9487160
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-94871602022-09-21 An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients Hu, Danqing Zhang, Huanyao Li, Shaolei Duan, Huilong Wu, Nan Lu, Xudong BMC Med Inform Decis Mak Research BACKGROUND: Lung cancer is the leading cause of cancer death worldwide. Prognostic prediction plays a vital role in the decision-making process for postoperative non-small cell lung cancer (NSCLC) patients. However, the high imbalance ratio of prognostic data limits the development of effective prognostic prediction models. METHODS: In this study, we present a novel approach, namely ensemble learning with active sampling (ELAS), to tackle the imbalanced data problem in NSCLC prognostic prediction. ELAS first applies an active sampling mechanism to query the most informative samples to update the base classifier to give it a new perspective. This training process is repeated until no enough samples are queried. Next, an internal validation set is employed to evaluate the base classifiers, and the ones with the best performances are integrated as the ensemble model. Besides, we set up multiple initial training data seeds and internal validation sets to ensure the stability and generalization of the model. RESULTS: We verified the effectiveness of the ELAS on a real clinical dataset containing 1848 postoperative NSCLC patients. Experimental results showed that the ELAS achieved the best averaged 0.736 AUROC value and 0.453 AUPRC value for 6 prognostic tasks and obtained significant improvements in comparison with the SVM, AdaBoost, Bagging, SMOTE and TomekLinks. CONCLUSIONS: We conclude that the ELAS can effectively alleviate the imbalanced data problem in NSCLC prognostic prediction and demonstrates good potential for future postoperative NSCLC prognostic prediction. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-022-01960-0. BioMed Central 2022-09-19 /pmc/articles/PMC9487160/ /pubmed/36123745 http://dx.doi.org/10.1186/s12911-022-01960-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Hu, Danqing
Zhang, Huanyao
Li, Shaolei
Duan, Huilong
Wu, Nan
Lu, Xudong
An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients
title An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients
title_full An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients
title_fullStr An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients
title_full_unstemmed An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients
title_short An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients
title_sort ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487160/
https://www.ncbi.nlm.nih.gov/pubmed/36123745
http://dx.doi.org/10.1186/s12911-022-01960-0
work_keys_str_mv AT hudanqing anensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT zhanghuanyao anensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT lishaolei anensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT duanhuilong anensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT wunan anensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT luxudong anensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT hudanqing ensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT zhanghuanyao ensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT lishaolei ensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT duanhuilong ensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT wunan ensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients
AT luxudong ensemblelearningwithactivesamplingtopredicttheprognosisofpostoperativenonsmallcelllungcancerpatients