Cargando…

Model-Based Clustering with Measurement or Estimation Errors

Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covari...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Wanli, Di, Yanming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7074130/
https://www.ncbi.nlm.nih.gov/pubmed/32050700
http://dx.doi.org/10.3390/genes11020185
_version_ 1783506766824210432
author Zhang, Wanli
Di, Yanming
author_facet Zhang, Wanli
Di, Yanming
author_sort Zhang, Wanli
collection PubMed
description Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling—called MCLUST-ME—that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices.
format Online
Article
Text
id pubmed-7074130
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-70741302020-03-19 Model-Based Clustering with Measurement or Estimation Errors Zhang, Wanli Di, Yanming Genes (Basel) Article Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling—called MCLUST-ME—that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices. MDPI 2020-02-10 /pmc/articles/PMC7074130/ /pubmed/32050700 http://dx.doi.org/10.3390/genes11020185 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Wanli
Di, Yanming
Model-Based Clustering with Measurement or Estimation Errors
title Model-Based Clustering with Measurement or Estimation Errors
title_full Model-Based Clustering with Measurement or Estimation Errors
title_fullStr Model-Based Clustering with Measurement or Estimation Errors
title_full_unstemmed Model-Based Clustering with Measurement or Estimation Errors
title_short Model-Based Clustering with Measurement or Estimation Errors
title_sort model-based clustering with measurement or estimation errors
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7074130/
https://www.ncbi.nlm.nih.gov/pubmed/32050700
http://dx.doi.org/10.3390/genes11020185
work_keys_str_mv AT zhangwanli modelbasedclusteringwithmeasurementorestimationerrors
AT diyanming modelbasedclusteringwithmeasurementorestimationerrors