Cargando…

Feature selection and survival modeling in The Cancer Genome Atlas

PURPOSE: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optima...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Hyunsoo, Bredel, Markus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Dove Medical Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3790279/
https://www.ncbi.nlm.nih.gov/pubmed/24098079
http://dx.doi.org/10.2147/IJN.S40733
_version_ 1782286570698571776
author Kim, Hyunsoo
Bredel, Markus
author_facet Kim, Hyunsoo
Bredel, Markus
author_sort Kim, Hyunsoo
collection PubMed
description PURPOSE: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling. PATIENTS AND METHODS: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1) 1-nearest neighbor (1-NN) survival prediction method; (2) random patient selection method and a Cox-based regression method with nested cross-validation; (3) least absolute shrinkage and selection operator (LASSO) optimization using whole-genome gene expression profiles; or (4) gene expression profiles of cancer pathway genes. RESULTS: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone. CONCLUSION: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology.
format Online
Article
Text
id pubmed-3790279
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Dove Medical Press
record_format MEDLINE/PubMed
spelling pubmed-37902792013-10-04 Feature selection and survival modeling in The Cancer Genome Atlas Kim, Hyunsoo Bredel, Markus Int J Nanomedicine Methodology PURPOSE: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling. PATIENTS AND METHODS: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1) 1-nearest neighbor (1-NN) survival prediction method; (2) random patient selection method and a Cox-based regression method with nested cross-validation; (3) least absolute shrinkage and selection operator (LASSO) optimization using whole-genome gene expression profiles; or (4) gene expression profiles of cancer pathway genes. RESULTS: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone. CONCLUSION: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology. Dove Medical Press 2013 2013-09-16 /pmc/articles/PMC3790279/ /pubmed/24098079 http://dx.doi.org/10.2147/IJN.S40733 Text en © 2013 Kim and Bredel, publisher and licensee Dove Medical Press Ltd This is an Open Access article which permits unrestricted noncommercial use, provided the original work is properly cited.
spellingShingle Methodology
Kim, Hyunsoo
Bredel, Markus
Feature selection and survival modeling in The Cancer Genome Atlas
title Feature selection and survival modeling in The Cancer Genome Atlas
title_full Feature selection and survival modeling in The Cancer Genome Atlas
title_fullStr Feature selection and survival modeling in The Cancer Genome Atlas
title_full_unstemmed Feature selection and survival modeling in The Cancer Genome Atlas
title_short Feature selection and survival modeling in The Cancer Genome Atlas
title_sort feature selection and survival modeling in the cancer genome atlas
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3790279/
https://www.ncbi.nlm.nih.gov/pubmed/24098079
http://dx.doi.org/10.2147/IJN.S40733
work_keys_str_mv AT kimhyunsoo featureselectionandsurvivalmodelinginthecancergenomeatlas
AT bredelmarkus featureselectionandsurvivalmodelinginthecancergenomeatlas