Cargando…

Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data

BACKGROUND: As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structur...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Na, Gao, Ying-Lian, Liu, Jin-Xing, Wang, Juan, Shang, Junliang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805321/
https://www.ncbi.nlm.nih.gov/pubmed/31639067
http://dx.doi.org/10.1186/s40246-019-0222-6
Descripción
Sumario:BACKGROUND: As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data. RESULTS: To alleviate these problems, we present a novel NMF framework named robust hypergraph regularized non-negative matrix factorization (RHNMF). In particular, the hypergraph Laplacian regularization is imposed to capture the geometric information of original data. Unlike graph Laplacian regularization which captures the relationship between pairwise sample points, it captures the high-order relationship among more sample points. Moreover, the robustness of the RHNMF is enhanced by using the L(2,1)-norm constraint when estimating the residual. This is because the L(2,1)-norm is insensitive to noise and outliers. CONCLUSIONS: Clustering and common abnormal expression gene (com-abnormal expression gene) selection are conducted to test the validity of the RHNMF model. Extensive experimental results on multi-view datasets reveal that our proposed model outperforms other state-of-the-art methods.