Cargando…

SiGMoiD: A super-statistical generative model for binary data

In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers’ identification of constraints and are computationally expensive...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Xiaochuan, Plata, Germán, Dixit, Purushottam D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8372922/
https://www.ncbi.nlm.nih.gov/pubmed/34358223
http://dx.doi.org/10.1371/journal.pcbi.1009275
_version_ 1783739855643082752
author Zhao, Xiaochuan
Plata, Germán
Dixit, Purushottam D.
author_facet Zhao, Xiaochuan
Plata, Germán
Dixit, Purushottam D.
author_sort Zhao, Xiaochuan
collection PubMed
description In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers’ identification of constraints and are computationally expensive to infer when the number of variables is large (N~100). Here, we address both these issues with Super-statistical Generative Model for binary Data (SiGMoiD). SiGMoiD is a maximum entropy-based framework where we imagine the data as arising from super-statistical system; individual binary variables in a given sample are coupled to the same ‘bath’ whose intensive variables vary from sample to sample. Importantly, unlike standard maximum entropy approaches where modeler specifies the constraints, the SiGMoiD algorithm infers them directly from the data. Due to this optimal choice of constraints, SiGMoiD allows us to model collections of a very large number (N>1000) of binary variables. Finally, SiGMoiD offers a reduced dimensional description of the data, allowing us to identify clusters of similar data points as well as binary variables. We illustrate the versatility of SiGMoiD using multiple datasets spanning several time- and length-scales.
format Online
Article
Text
id pubmed-8372922
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-83729222021-08-19 SiGMoiD: A super-statistical generative model for binary data Zhao, Xiaochuan Plata, Germán Dixit, Purushottam D. PLoS Comput Biol Research Article In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers’ identification of constraints and are computationally expensive to infer when the number of variables is large (N~100). Here, we address both these issues with Super-statistical Generative Model for binary Data (SiGMoiD). SiGMoiD is a maximum entropy-based framework where we imagine the data as arising from super-statistical system; individual binary variables in a given sample are coupled to the same ‘bath’ whose intensive variables vary from sample to sample. Importantly, unlike standard maximum entropy approaches where modeler specifies the constraints, the SiGMoiD algorithm infers them directly from the data. Due to this optimal choice of constraints, SiGMoiD allows us to model collections of a very large number (N>1000) of binary variables. Finally, SiGMoiD offers a reduced dimensional description of the data, allowing us to identify clusters of similar data points as well as binary variables. We illustrate the versatility of SiGMoiD using multiple datasets spanning several time- and length-scales. Public Library of Science 2021-08-06 /pmc/articles/PMC8372922/ /pubmed/34358223 http://dx.doi.org/10.1371/journal.pcbi.1009275 Text en © 2021 Zhao et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhao, Xiaochuan
Plata, Germán
Dixit, Purushottam D.
SiGMoiD: A super-statistical generative model for binary data
title SiGMoiD: A super-statistical generative model for binary data
title_full SiGMoiD: A super-statistical generative model for binary data
title_fullStr SiGMoiD: A super-statistical generative model for binary data
title_full_unstemmed SiGMoiD: A super-statistical generative model for binary data
title_short SiGMoiD: A super-statistical generative model for binary data
title_sort sigmoid: a super-statistical generative model for binary data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8372922/
https://www.ncbi.nlm.nih.gov/pubmed/34358223
http://dx.doi.org/10.1371/journal.pcbi.1009275
work_keys_str_mv AT zhaoxiaochuan sigmoidasuperstatisticalgenerativemodelforbinarydata
AT platagerman sigmoidasuperstatisticalgenerativemodelforbinarydata
AT dixitpurushottamd sigmoidasuperstatisticalgenerativemodelforbinarydata