Cargando…

A coupling approach of a predictor and a descriptor for breast cancer prognosis

BACKGROUND: In cancer prognosis research, diverse machine learning models have applied to the problems of cancer susceptibility (risk assessment), cancer recurrence (redevelopment of cancer after resolution), and cancer survivability, regarding an accuracy (or an AUC--the area under the ROC curve) a...

Descripción completa

Detalles Bibliográficos
Autores principales: Shin, Hyunjung, Nam, Yonghyun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4101306/
https://www.ncbi.nlm.nih.gov/pubmed/25080202
http://dx.doi.org/10.1186/1755-8794-7-S1-S4
_version_ 1782480867417915392
author Shin, Hyunjung
Nam, Yonghyun
author_facet Shin, Hyunjung
Nam, Yonghyun
author_sort Shin, Hyunjung
collection PubMed
description BACKGROUND: In cancer prognosis research, diverse machine learning models have applied to the problems of cancer susceptibility (risk assessment), cancer recurrence (redevelopment of cancer after resolution), and cancer survivability, regarding an accuracy (or an AUC--the area under the ROC curve) as a primary measurement for the performance evaluation of the models. However, in order to help medical specialists to establish a treatment plan by using the predicted output of a model, it is more pragmatic to elucidate which variables (markers) have most significantly influenced to the resulting outcome of cancer or which patients show similar patterns. METHODS: In this study, a coupling approach of two sub-modules--a predictor and a descriptor--is proposed. The predictor module generates the predicted output for the cancer outcome. Semi-supervised learning co-training algorithm is employed as a predictor. On the other hand, the descriptor module post-processes the results of the predictor module, mainly focusing on which variables are more highly or less significantly ranked when describing the results of the prediction, and how patients are segmented into several groups according to the trait of common patterns among them. Decision trees are used as a descriptor. RESULTS: The proposed approach, 'predictor-descriptor,' was tested on the breast cancer survivability problem based on the surveillance, epidemiology, and end results database for breast cancer (SEER). The results present the performance comparison among the established machine leaning algorithms, the ranks of the prognosis elements for breast cancer, and patient segmentation. In the performance comparison among the predictor candidates, Semi-supervised learning co-training algorithm showed best performance, producing an average AUC of 0.81. Later, the descriptor module found the top-tier prognosis markers which significantly affect to the classification results on survived/dead patients: 'lymph node involvement', 'stage', 'site-specific surgery', 'number of positive node examined', and 'tumor size', etc. Also, a typical example of patient-segmentation was provided: the patients classified as dead were grouped into two segments depending on difference in prognostic profiles, ones with serious results with respect to the pathologic exams and the others with the feebleness of age.
format Online
Article
Text
id pubmed-4101306
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41013062014-07-18 A coupling approach of a predictor and a descriptor for breast cancer prognosis Shin, Hyunjung Nam, Yonghyun BMC Med Genomics Research BACKGROUND: In cancer prognosis research, diverse machine learning models have applied to the problems of cancer susceptibility (risk assessment), cancer recurrence (redevelopment of cancer after resolution), and cancer survivability, regarding an accuracy (or an AUC--the area under the ROC curve) as a primary measurement for the performance evaluation of the models. However, in order to help medical specialists to establish a treatment plan by using the predicted output of a model, it is more pragmatic to elucidate which variables (markers) have most significantly influenced to the resulting outcome of cancer or which patients show similar patterns. METHODS: In this study, a coupling approach of two sub-modules--a predictor and a descriptor--is proposed. The predictor module generates the predicted output for the cancer outcome. Semi-supervised learning co-training algorithm is employed as a predictor. On the other hand, the descriptor module post-processes the results of the predictor module, mainly focusing on which variables are more highly or less significantly ranked when describing the results of the prediction, and how patients are segmented into several groups according to the trait of common patterns among them. Decision trees are used as a descriptor. RESULTS: The proposed approach, 'predictor-descriptor,' was tested on the breast cancer survivability problem based on the surveillance, epidemiology, and end results database for breast cancer (SEER). The results present the performance comparison among the established machine leaning algorithms, the ranks of the prognosis elements for breast cancer, and patient segmentation. In the performance comparison among the predictor candidates, Semi-supervised learning co-training algorithm showed best performance, producing an average AUC of 0.81. Later, the descriptor module found the top-tier prognosis markers which significantly affect to the classification results on survived/dead patients: 'lymph node involvement', 'stage', 'site-specific surgery', 'number of positive node examined', and 'tumor size', etc. Also, a typical example of patient-segmentation was provided: the patients classified as dead were grouped into two segments depending on difference in prognostic profiles, ones with serious results with respect to the pathologic exams and the others with the feebleness of age. BioMed Central 2014-05-08 /pmc/articles/PMC4101306/ /pubmed/25080202 http://dx.doi.org/10.1186/1755-8794-7-S1-S4 Text en Copyright © 2014 Shin and Nam; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Shin, Hyunjung
Nam, Yonghyun
A coupling approach of a predictor and a descriptor for breast cancer prognosis
title A coupling approach of a predictor and a descriptor for breast cancer prognosis
title_full A coupling approach of a predictor and a descriptor for breast cancer prognosis
title_fullStr A coupling approach of a predictor and a descriptor for breast cancer prognosis
title_full_unstemmed A coupling approach of a predictor and a descriptor for breast cancer prognosis
title_short A coupling approach of a predictor and a descriptor for breast cancer prognosis
title_sort coupling approach of a predictor and a descriptor for breast cancer prognosis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4101306/
https://www.ncbi.nlm.nih.gov/pubmed/25080202
http://dx.doi.org/10.1186/1755-8794-7-S1-S4
work_keys_str_mv AT shinhyunjung acouplingapproachofapredictorandadescriptorforbreastcancerprognosis
AT namyonghyun acouplingapproachofapredictorandadescriptorforbreastcancerprognosis
AT shinhyunjung couplingapproachofapredictorandadescriptorforbreastcancerprognosis
AT namyonghyun couplingapproachofapredictorandadescriptorforbreastcancerprognosis