Cargando…

Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health

Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF)...

Descripción completa

Detalles Bibliográficos
Autores principales: Fogel, Paul, Gaston-Mathé, Yann, Hawkins, Douglas, Fogel, Fajwel, Luta, George, Young, S. Stanley
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4881134/
https://www.ncbi.nlm.nih.gov/pubmed/27213413
http://dx.doi.org/10.3390/ijerph13050509
_version_ 1782433917488332800
author Fogel, Paul
Gaston-Mathé, Yann
Hawkins, Douglas
Fogel, Fajwel
Luta, George
Young, S. Stanley
author_facet Fogel, Paul
Gaston-Mathé, Yann
Hawkins, Douglas
Fogel, Fajwel
Luta, George
Young, S. Stanley
author_sort Fogel, Paul
collection PubMed
description Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability.
format Online
Article
Text
id pubmed-4881134
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-48811342016-05-27 Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health Fogel, Paul Gaston-Mathé, Yann Hawkins, Douglas Fogel, Fajwel Luta, George Young, S. Stanley Int J Environ Res Public Health Article Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability. MDPI 2016-05-18 2016-05 /pmc/articles/PMC4881134/ /pubmed/27213413 http://dx.doi.org/10.3390/ijerph13050509 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Fogel, Paul
Gaston-Mathé, Yann
Hawkins, Douglas
Fogel, Fajwel
Luta, George
Young, S. Stanley
Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health
title Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health
title_full Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health
title_fullStr Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health
title_full_unstemmed Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health
title_short Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health
title_sort applications of a novel clustering approach using non-negative matrix factorization to environmental research in public health
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4881134/
https://www.ncbi.nlm.nih.gov/pubmed/27213413
http://dx.doi.org/10.3390/ijerph13050509
work_keys_str_mv AT fogelpaul applicationsofanovelclusteringapproachusingnonnegativematrixfactorizationtoenvironmentalresearchinpublichealth
AT gastonmatheyann applicationsofanovelclusteringapproachusingnonnegativematrixfactorizationtoenvironmentalresearchinpublichealth
AT hawkinsdouglas applicationsofanovelclusteringapproachusingnonnegativematrixfactorizationtoenvironmentalresearchinpublichealth
AT fogelfajwel applicationsofanovelclusteringapproachusingnonnegativematrixfactorizationtoenvironmentalresearchinpublichealth
AT lutageorge applicationsofanovelclusteringapproachusingnonnegativematrixfactorizationtoenvironmentalresearchinpublichealth
AT youngsstanley applicationsofanovelclusteringapproachusingnonnegativematrixfactorizationtoenvironmentalresearchinpublichealth