Cargando…
Feature selection and survival modeling in The Cancer Genome Atlas
PURPOSE: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optima...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Dove Medical Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3790279/ https://www.ncbi.nlm.nih.gov/pubmed/24098079 http://dx.doi.org/10.2147/IJN.S40733 |
_version_ | 1782286570698571776 |
---|---|
author | Kim, Hyunsoo Bredel, Markus |
author_facet | Kim, Hyunsoo Bredel, Markus |
author_sort | Kim, Hyunsoo |
collection | PubMed |
description | PURPOSE: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling. PATIENTS AND METHODS: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1) 1-nearest neighbor (1-NN) survival prediction method; (2) random patient selection method and a Cox-based regression method with nested cross-validation; (3) least absolute shrinkage and selection operator (LASSO) optimization using whole-genome gene expression profiles; or (4) gene expression profiles of cancer pathway genes. RESULTS: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone. CONCLUSION: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology. |
format | Online Article Text |
id | pubmed-3790279 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Dove Medical Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-37902792013-10-04 Feature selection and survival modeling in The Cancer Genome Atlas Kim, Hyunsoo Bredel, Markus Int J Nanomedicine Methodology PURPOSE: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling. PATIENTS AND METHODS: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1) 1-nearest neighbor (1-NN) survival prediction method; (2) random patient selection method and a Cox-based regression method with nested cross-validation; (3) least absolute shrinkage and selection operator (LASSO) optimization using whole-genome gene expression profiles; or (4) gene expression profiles of cancer pathway genes. RESULTS: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone. CONCLUSION: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology. Dove Medical Press 2013 2013-09-16 /pmc/articles/PMC3790279/ /pubmed/24098079 http://dx.doi.org/10.2147/IJN.S40733 Text en © 2013 Kim and Bredel, publisher and licensee Dove Medical Press Ltd This is an Open Access article which permits unrestricted noncommercial use, provided the original work is properly cited. |
spellingShingle | Methodology Kim, Hyunsoo Bredel, Markus Feature selection and survival modeling in The Cancer Genome Atlas |
title | Feature selection and survival modeling in The Cancer Genome Atlas |
title_full | Feature selection and survival modeling in The Cancer Genome Atlas |
title_fullStr | Feature selection and survival modeling in The Cancer Genome Atlas |
title_full_unstemmed | Feature selection and survival modeling in The Cancer Genome Atlas |
title_short | Feature selection and survival modeling in The Cancer Genome Atlas |
title_sort | feature selection and survival modeling in the cancer genome atlas |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3790279/ https://www.ncbi.nlm.nih.gov/pubmed/24098079 http://dx.doi.org/10.2147/IJN.S40733 |
work_keys_str_mv | AT kimhyunsoo featureselectionandsurvivalmodelinginthecancergenomeatlas AT bredelmarkus featureselectionandsurvivalmodelinginthecancergenomeatlas |