Cargando…

Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis

BACKGROUND: Cancer subtype identification is important for the early diagnosis of cancer and the provision of adequate treatment. Prior to identifying the subtype of cancer in a patient, feature selection is also crucial for reducing the dimensionality of the data by detecting genes that contain imp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Park, JiYoon, Lee, Jae Won, Park, Mira
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329370/ https://www.ncbi.nlm.nih.gov/pubmed/37420304 http://dx.doi.org/10.1186/s13040-023-00334-0

_version_	1785070004440399872
author	Park, JiYoon Lee, Jae Won Park, Mira
author_facet	Park, JiYoon Lee, Jae Won Park, Mira
author_sort	Park, JiYoon
collection	PubMed
description	BACKGROUND: Cancer subtype identification is important for the early diagnosis of cancer and the provision of adequate treatment. Prior to identifying the subtype of cancer in a patient, feature selection is also crucial for reducing the dimensionality of the data by detecting genes that contain important information about the cancer subtype. Numerous cancer subtyping methods have been developed, and their performance has been compared. However, combinations of feature selection and subtype identification methods have rarely been considered. This study aimed to identify the best combination of variable selection and subtype identification methods in single omics data analysis. RESULTS: Combinations of six filter-based methods and six unsupervised subtype identification methods were investigated using The Cancer Genome Atlas (TCGA) datasets for four cancers. The number of features selected varied, and several evaluation metrics were used. Although no single combination was found to have a distinctively good performance, Consensus Clustering (CC) and Neighborhood-Based Multi-omics Clustering (NEMO) used with variance-based feature selection had a tendency to show lower p-values, and nonnegative matrix factorization (NMF) stably showed good performance in many cases unless the Dip test was used for feature selection. In terms of accuracy, the combination of NMF and similarity network fusion (SNF) with Monte Carlo Feature Selection (MCFS) and Minimum-Redundancy Maximum Relevance (mRMR) showed good overall performance. NMF always showed among the worst performances without feature selection in all datasets, but performed much better when used with various feature selection methods. iClusterBayes (ICB) had decent performance when used without feature selection. CONCLUSIONS: Rather than a single method clearly emerging as optimal, the best methodology was different depending on the data used, the number of features selected, and the evaluation method. A guideline for choosing the best combination method under various situations is provided.
format	Online Article Text
id	pubmed-10329370
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-103293702023-07-09 Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis Park, JiYoon Lee, Jae Won Park, Mira BioData Min Methodology BACKGROUND: Cancer subtype identification is important for the early diagnosis of cancer and the provision of adequate treatment. Prior to identifying the subtype of cancer in a patient, feature selection is also crucial for reducing the dimensionality of the data by detecting genes that contain important information about the cancer subtype. Numerous cancer subtyping methods have been developed, and their performance has been compared. However, combinations of feature selection and subtype identification methods have rarely been considered. This study aimed to identify the best combination of variable selection and subtype identification methods in single omics data analysis. RESULTS: Combinations of six filter-based methods and six unsupervised subtype identification methods were investigated using The Cancer Genome Atlas (TCGA) datasets for four cancers. The number of features selected varied, and several evaluation metrics were used. Although no single combination was found to have a distinctively good performance, Consensus Clustering (CC) and Neighborhood-Based Multi-omics Clustering (NEMO) used with variance-based feature selection had a tendency to show lower p-values, and nonnegative matrix factorization (NMF) stably showed good performance in many cases unless the Dip test was used for feature selection. In terms of accuracy, the combination of NMF and similarity network fusion (SNF) with Monte Carlo Feature Selection (MCFS) and Minimum-Redundancy Maximum Relevance (mRMR) showed good overall performance. NMF always showed among the worst performances without feature selection in all datasets, but performed much better when used with various feature selection methods. iClusterBayes (ICB) had decent performance when used without feature selection. CONCLUSIONS: Rather than a single method clearly emerging as optimal, the best methodology was different depending on the data used, the number of features selected, and the evaluation method. A guideline for choosing the best combination method under various situations is provided. BioMed Central 2023-07-07 /pmc/articles/PMC10329370/ /pubmed/37420304 http://dx.doi.org/10.1186/s13040-023-00334-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Park, JiYoon Lee, Jae Won Park, Mira Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis
title	Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis
title_full	Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis
title_fullStr	Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis
title_full_unstemmed	Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis
title_short	Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis
title_sort	comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329370/ https://www.ncbi.nlm.nih.gov/pubmed/37420304 http://dx.doi.org/10.1186/s13040-023-00334-0
work_keys_str_mv	AT parkjiyoon comparisonofcancersubtypeidentificationmethodscombinedwithfeatureselectionmethodsinomicsdataanalysis AT leejaewon comparisonofcancersubtypeidentificationmethodscombinedwithfeatureselectionmethodsinomicsdataanalysis AT parkmira comparisonofcancersubtypeidentificationmethodscombinedwithfeatureselectionmethodsinomicsdataanalysis

Comparison of cancer subtype identification methods combined with feature selection methods in omics data analysis

Ejemplares similares