Cargando…
A comparative study of clustering methods on gene expression data for lung cancer prognosis
Lung cancer subtyping based on gene expression data is important for identifying patient subgroups with differing survival prognosis to facilitate customized treatment strategies for each subtype of patients. Unsupervised clustering methods are the traditional approach for clustering patients into s...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10630994/ https://www.ncbi.nlm.nih.gov/pubmed/37941025 http://dx.doi.org/10.1186/s13104-023-06604-8 |
_version_ | 1785132272728408064 |
---|---|
author | Zhang, Jason Z. Wang, Chi |
author_facet | Zhang, Jason Z. Wang, Chi |
author_sort | Zhang, Jason Z. |
collection | PubMed |
description | Lung cancer subtyping based on gene expression data is important for identifying patient subgroups with differing survival prognosis to facilitate customized treatment strategies for each subtype of patients. Unsupervised clustering methods are the traditional approach for clustering patients into subtypes. However, since those methods cluster patients based only on gene expression data, the resulting clusters may not always be relevant to the survival outcome of interest. In recent years, semi-supervised and supervised methods have been proposed, which leverage the survival outcome data to identify clusters more relevant to survival prognosis. This paper aims to compare the performance of different clustering methods for identifying clinically prognostic lung cancer subtypes based on two lung adenocarcinoma datasets. For each method, we clustered patients into two clusters and assessed the difference in patient survival time between clusters. Unsupervised methods were found to have large logrank p-values and no significant results in most cases. Semi-supervised and supervised methods had improved performance over unsupervised methods and very significant p-values. These results indicate that unsupervised methods are not capable of identifying clusters with significant differences in survival prognosis in most cases, while supervised and semi-supervised methods can better cluster patients into clinically useful subtypes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13104-023-06604-8. |
format | Online Article Text |
id | pubmed-10630994 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106309942023-11-07 A comparative study of clustering methods on gene expression data for lung cancer prognosis Zhang, Jason Z. Wang, Chi BMC Res Notes Research Note Lung cancer subtyping based on gene expression data is important for identifying patient subgroups with differing survival prognosis to facilitate customized treatment strategies for each subtype of patients. Unsupervised clustering methods are the traditional approach for clustering patients into subtypes. However, since those methods cluster patients based only on gene expression data, the resulting clusters may not always be relevant to the survival outcome of interest. In recent years, semi-supervised and supervised methods have been proposed, which leverage the survival outcome data to identify clusters more relevant to survival prognosis. This paper aims to compare the performance of different clustering methods for identifying clinically prognostic lung cancer subtypes based on two lung adenocarcinoma datasets. For each method, we clustered patients into two clusters and assessed the difference in patient survival time between clusters. Unsupervised methods were found to have large logrank p-values and no significant results in most cases. Semi-supervised and supervised methods had improved performance over unsupervised methods and very significant p-values. These results indicate that unsupervised methods are not capable of identifying clusters with significant differences in survival prognosis in most cases, while supervised and semi-supervised methods can better cluster patients into clinically useful subtypes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13104-023-06604-8. BioMed Central 2023-11-08 /pmc/articles/PMC10630994/ /pubmed/37941025 http://dx.doi.org/10.1186/s13104-023-06604-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Note Zhang, Jason Z. Wang, Chi A comparative study of clustering methods on gene expression data for lung cancer prognosis |
title | A comparative study of clustering methods on gene expression data for lung cancer prognosis |
title_full | A comparative study of clustering methods on gene expression data for lung cancer prognosis |
title_fullStr | A comparative study of clustering methods on gene expression data for lung cancer prognosis |
title_full_unstemmed | A comparative study of clustering methods on gene expression data for lung cancer prognosis |
title_short | A comparative study of clustering methods on gene expression data for lung cancer prognosis |
title_sort | comparative study of clustering methods on gene expression data for lung cancer prognosis |
topic | Research Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10630994/ https://www.ncbi.nlm.nih.gov/pubmed/37941025 http://dx.doi.org/10.1186/s13104-023-06604-8 |
work_keys_str_mv | AT zhangjasonz acomparativestudyofclusteringmethodsongeneexpressiondataforlungcancerprognosis AT wangchi acomparativestudyofclusteringmethodsongeneexpressiondataforlungcancerprognosis AT zhangjasonz comparativestudyofclusteringmethodsongeneexpressiondataforlungcancerprognosis AT wangchi comparativestudyofclusteringmethodsongeneexpressiondataforlungcancerprognosis |