Cargando…

Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer

In this paper, given data with high-dimensional features, we study this problem of how to calculate the similarity between two samples by considering feature interaction network, where a feature interaction network represents the relationship between features. This is different from some traditional...

Descripción completa

Detalles Bibliográficos
Autores principales:	QIANG, JIPENG, DING, WEI, KUIJJER, MARIEKE, QUACKENBUSH, JOHN, CHEN, PING
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9629797/ https://www.ncbi.nlm.nih.gov/pubmed/36329870 http://dx.doi.org/10.1109/access.2020.2982569

_version_	1784823469758742528
author	QIANG, JIPENG DING, WEI KUIJJER, MARIEKE QUACKENBUSH, JOHN CHEN, PING
author_facet	QIANG, JIPENG DING, WEI KUIJJER, MARIEKE QUACKENBUSH, JOHN CHEN, PING
author_sort	QIANG, JIPENG
collection	PubMed
description	In this paper, given data with high-dimensional features, we study this problem of how to calculate the similarity between two samples by considering feature interaction network, where a feature interaction network represents the relationship between features. This is different from some traditional methods, those of which learn similarities based on a sample network that represents the relationship between samples. Therefore, we propose a novel network-based similarity metric for computing the similarity between samples, which incorporates the knowledge of feature interaction network, in order to overcome the data sparseness problem. Our similarity metric uses a new Feature Alignment Similarity measure, which does not directly compute the similarities among samples, but projects each sample into a feature interaction network and measures the similarities between two samples using the similarities between the vertices of the samples in the network. As such, when two samples do not share any common features, they are likely to have higher similarity values when their features share the similar network regions. For ensuring that the metric is useful in a real-world application, we apply our metric to discover subtypes in tumor mutational data by incorporating the information of the gene interaction network. Our experimental results from using synthetic data and real-world tumor mutational data show that our approach outperforms the top competitors in cancer subtype discovery. Furthermore, our approach can identify cancer subtypes that cannot be detected by other clustering algorithms in real cancer data.
format	Online Article Text
id	pubmed-9629797
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-96297972022-11-02 Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer QIANG, JIPENG DING, WEI KUIJJER, MARIEKE QUACKENBUSH, JOHN CHEN, PING IEEE Access Article In this paper, given data with high-dimensional features, we study this problem of how to calculate the similarity between two samples by considering feature interaction network, where a feature interaction network represents the relationship between features. This is different from some traditional methods, those of which learn similarities based on a sample network that represents the relationship between samples. Therefore, we propose a novel network-based similarity metric for computing the similarity between samples, which incorporates the knowledge of feature interaction network, in order to overcome the data sparseness problem. Our similarity metric uses a new Feature Alignment Similarity measure, which does not directly compute the similarities among samples, but projects each sample into a feature interaction network and measures the similarities between two samples using the similarities between the vertices of the samples in the network. As such, when two samples do not share any common features, they are likely to have higher similarity values when their features share the similar network regions. For ensuring that the metric is useful in a real-world application, we apply our metric to discover subtypes in tumor mutational data by incorporating the information of the gene interaction network. Our experimental results from using synthetic data and real-world tumor mutational data show that our approach outperforms the top competitors in cancer subtype discovery. Furthermore, our approach can identify cancer subtypes that cannot be detected by other clustering algorithms in real cancer data. 2020 2020-03-26 /pmc/articles/PMC9629797/ /pubmed/36329870 http://dx.doi.org/10.1109/access.2020.2982569 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
spellingShingle	Article QIANG, JIPENG DING, WEI KUIJJER, MARIEKE QUACKENBUSH, JOHN CHEN, PING Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer
title	Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer
title_full	Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer
title_fullStr	Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer
title_full_unstemmed	Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer
title_short	Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer
title_sort	clustering sparse data with feature correlation with application to discover subtypes in cancer
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9629797/ https://www.ncbi.nlm.nih.gov/pubmed/36329870 http://dx.doi.org/10.1109/access.2020.2982569
work_keys_str_mv	AT qiangjipeng clusteringsparsedatawithfeaturecorrelationwithapplicationtodiscoversubtypesincancer AT dingwei clusteringsparsedatawithfeaturecorrelationwithapplicationtodiscoversubtypesincancer AT kuijjermarieke clusteringsparsedatawithfeaturecorrelationwithapplicationtodiscoversubtypesincancer AT quackenbushjohn clusteringsparsedatawithfeaturecorrelationwithapplicationtodiscoversubtypesincancer AT chenping clusteringsparsedatawithfeaturecorrelationwithapplicationtodiscoversubtypesincancer

Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer

Ejemplares similares