Cargando…

SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts

Gene expression profiling together with unsupervised analysis methods, typically clustering methods, has been used extensively in cancer research to unravel, e.g., new molecular subtypes that hold promise of disease refinement that may ultimately benefit patients. However, many of the commonly used...

Descripción completa

Detalles Bibliográficos
Autores principales: Karlström, Jacob, Aine, Mattias, Staaf, Johan, Veerla, Srinivas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9010551/
https://www.ncbi.nlm.nih.gov/pubmed/35465158
http://dx.doi.org/10.1016/j.csbj.2022.03.036
_version_ 1784687502113636352
author Karlström, Jacob
Aine, Mattias
Staaf, Johan
Veerla, Srinivas
author_facet Karlström, Jacob
Aine, Mattias
Staaf, Johan
Veerla, Srinivas
author_sort Karlström, Jacob
collection PubMed
description Gene expression profiling together with unsupervised analysis methods, typically clustering methods, has been used extensively in cancer research to unravel, e.g., new molecular subtypes that hold promise of disease refinement that may ultimately benefit patients. However, many of the commonly used methods require a prespecified number of clusters to extract and frequently require some type of feature pre-selection, e.g. variance filtering. This introduces subjectivity to the process of cluster discovery and the definition of putative novel tumor subtypes. Here, we introduce SRIQ, a novel unsupervised clustering method that could circumvent some of the issues in commonly used unsupervised analysis methods. SRIQ incorporates concepts from random forest machine learning as well as quality threshold- and k-nearest neighbor clustering. It is implemented as a Java and Python pipeline including data pre-processing, differential expression analysis, and pathway analysis. Using 434 lung adenocarcinomas profiled by RNA sequencing, we demonstrate the technical reproducibility of SRIQ and benchmark its performance compared to the commonly used consensus clustering method. Based on differential gene expression analysis and auxiliary molecular data we show that SRIQ can define new tumor subsets that appear biologically relevant and consistent compared and that these new subgroups seem to refine existing transcriptional subtypes that were defined using consensus clustering. Together, this provides support that SRIQ may be a useful new tool for unsupervised analysis of gene expression data from human malignancies.
format Online
Article
Text
id pubmed-9010551
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-90105512022-04-21 SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts Karlström, Jacob Aine, Mattias Staaf, Johan Veerla, Srinivas Comput Struct Biotechnol J Research Article Gene expression profiling together with unsupervised analysis methods, typically clustering methods, has been used extensively in cancer research to unravel, e.g., new molecular subtypes that hold promise of disease refinement that may ultimately benefit patients. However, many of the commonly used methods require a prespecified number of clusters to extract and frequently require some type of feature pre-selection, e.g. variance filtering. This introduces subjectivity to the process of cluster discovery and the definition of putative novel tumor subtypes. Here, we introduce SRIQ, a novel unsupervised clustering method that could circumvent some of the issues in commonly used unsupervised analysis methods. SRIQ incorporates concepts from random forest machine learning as well as quality threshold- and k-nearest neighbor clustering. It is implemented as a Java and Python pipeline including data pre-processing, differential expression analysis, and pathway analysis. Using 434 lung adenocarcinomas profiled by RNA sequencing, we demonstrate the technical reproducibility of SRIQ and benchmark its performance compared to the commonly used consensus clustering method. Based on differential gene expression analysis and auxiliary molecular data we show that SRIQ can define new tumor subsets that appear biologically relevant and consistent compared and that these new subgroups seem to refine existing transcriptional subtypes that were defined using consensus clustering. Together, this provides support that SRIQ may be a useful new tool for unsupervised analysis of gene expression data from human malignancies. Research Network of Computational and Structural Biotechnology 2022-04-04 /pmc/articles/PMC9010551/ /pubmed/35465158 http://dx.doi.org/10.1016/j.csbj.2022.03.036 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Karlström, Jacob
Aine, Mattias
Staaf, Johan
Veerla, Srinivas
SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts
title SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts
title_full SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts
title_fullStr SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts
title_full_unstemmed SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts
title_short SRIQ clustering: A fusion of Random Forest, QT clustering, and KNN concepts
title_sort sriq clustering: a fusion of random forest, qt clustering, and knn concepts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9010551/
https://www.ncbi.nlm.nih.gov/pubmed/35465158
http://dx.doi.org/10.1016/j.csbj.2022.03.036
work_keys_str_mv AT karlstromjacob sriqclusteringafusionofrandomforestqtclusteringandknnconcepts
AT ainemattias sriqclusteringafusionofrandomforestqtclusteringandknnconcepts
AT staafjohan sriqclusteringafusionofrandomforestqtclusteringandknnconcepts
AT veerlasrinivas sriqclusteringafusionofrandomforestqtclusteringandknnconcepts