Cargando…

A comparative study of clustering methods on gene expression data for lung cancer prognosis

Lung cancer subtyping based on gene expression data is important for identifying patient subgroups with differing survival prognosis to facilitate customized treatment strategies for each subtype of patients. Unsupervised clustering methods are the traditional approach for clustering patients into s...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jason Z., Wang, Chi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10630994/
https://www.ncbi.nlm.nih.gov/pubmed/37941025
http://dx.doi.org/10.1186/s13104-023-06604-8
_version_ 1785132272728408064
author Zhang, Jason Z.
Wang, Chi
author_facet Zhang, Jason Z.
Wang, Chi
author_sort Zhang, Jason Z.
collection PubMed
description Lung cancer subtyping based on gene expression data is important for identifying patient subgroups with differing survival prognosis to facilitate customized treatment strategies for each subtype of patients. Unsupervised clustering methods are the traditional approach for clustering patients into subtypes. However, since those methods cluster patients based only on gene expression data, the resulting clusters may not always be relevant to the survival outcome of interest. In recent years, semi-supervised and supervised methods have been proposed, which leverage the survival outcome data to identify clusters more relevant to survival prognosis. This paper aims to compare the performance of different clustering methods for identifying clinically prognostic lung cancer subtypes based on two lung adenocarcinoma datasets. For each method, we clustered patients into two clusters and assessed the difference in patient survival time between clusters. Unsupervised methods were found to have large logrank p-values and no significant results in most cases. Semi-supervised and supervised methods had improved performance over unsupervised methods and very significant p-values. These results indicate that unsupervised methods are not capable of identifying clusters with significant differences in survival prognosis in most cases, while supervised and semi-supervised methods can better cluster patients into clinically useful subtypes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13104-023-06604-8.
format Online
Article
Text
id pubmed-10630994
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106309942023-11-07 A comparative study of clustering methods on gene expression data for lung cancer prognosis Zhang, Jason Z. Wang, Chi BMC Res Notes Research Note Lung cancer subtyping based on gene expression data is important for identifying patient subgroups with differing survival prognosis to facilitate customized treatment strategies for each subtype of patients. Unsupervised clustering methods are the traditional approach for clustering patients into subtypes. However, since those methods cluster patients based only on gene expression data, the resulting clusters may not always be relevant to the survival outcome of interest. In recent years, semi-supervised and supervised methods have been proposed, which leverage the survival outcome data to identify clusters more relevant to survival prognosis. This paper aims to compare the performance of different clustering methods for identifying clinically prognostic lung cancer subtypes based on two lung adenocarcinoma datasets. For each method, we clustered patients into two clusters and assessed the difference in patient survival time between clusters. Unsupervised methods were found to have large logrank p-values and no significant results in most cases. Semi-supervised and supervised methods had improved performance over unsupervised methods and very significant p-values. These results indicate that unsupervised methods are not capable of identifying clusters with significant differences in survival prognosis in most cases, while supervised and semi-supervised methods can better cluster patients into clinically useful subtypes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13104-023-06604-8. BioMed Central 2023-11-08 /pmc/articles/PMC10630994/ /pubmed/37941025 http://dx.doi.org/10.1186/s13104-023-06604-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Note
Zhang, Jason Z.
Wang, Chi
A comparative study of clustering methods on gene expression data for lung cancer prognosis
title A comparative study of clustering methods on gene expression data for lung cancer prognosis
title_full A comparative study of clustering methods on gene expression data for lung cancer prognosis
title_fullStr A comparative study of clustering methods on gene expression data for lung cancer prognosis
title_full_unstemmed A comparative study of clustering methods on gene expression data for lung cancer prognosis
title_short A comparative study of clustering methods on gene expression data for lung cancer prognosis
title_sort comparative study of clustering methods on gene expression data for lung cancer prognosis
topic Research Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10630994/
https://www.ncbi.nlm.nih.gov/pubmed/37941025
http://dx.doi.org/10.1186/s13104-023-06604-8
work_keys_str_mv AT zhangjasonz acomparativestudyofclusteringmethodsongeneexpressiondataforlungcancerprognosis
AT wangchi acomparativestudyofclusteringmethodsongeneexpressiondataforlungcancerprognosis
AT zhangjasonz comparativestudyofclusteringmethodsongeneexpressiondataforlungcancerprognosis
AT wangchi comparativestudyofclusteringmethodsongeneexpressiondataforlungcancerprognosis