Cargando…

High dimensional model representation of log-likelihood ratio: binary classification with expression data

BACKGROUND: Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene interactions...

Descripción completa

Detalles Bibliográficos
Autores principales:	Foroughi pour, Ali, Pietrzak, Maciej, Dalton, Lori A, Rempała, Grzegorz A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7183128/ https://www.ncbi.nlm.nih.gov/pubmed/32334509 http://dx.doi.org/10.1186/s12859-020-3486-x

_version_	1783526371517005824
author	Foroughi pour, Ali Pietrzak, Maciej Dalton, Lori A Rempała, Grzegorz A.
author_facet	Foroughi pour, Ali Pietrzak, Maciej Dalton, Lori A Rempała, Grzegorz A.
author_sort	Foroughi pour, Ali
collection	PubMed
description	BACKGROUND: Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene interactions, and (b) the need for highly interpretable glass-box models. We use the theory of high dimensional model representation (HDMR) to build interpretable low dimensional approximations of the log-likelihood ratio accounting for the effects of each individual gene as well as gene-gene interactions. We propose two algorithms approximating the second order HDMR expansion, and a hypothesis test based on the HDMR formulation to identify significantly dysregulated pairwise interactions. The theory is seen as flexible and requiring only a mild set of assumptions. RESULTS: We apply our approach to gene expression data from both synthetic and real (breast and lung cancer) datasets comparing it also against several popular state-of-the-art methods. The analyses suggest the proposed algorithms can be used to obtain interpretable prediction rules with high prediction accuracies and to successfully extract significantly dysregulated gene-gene interactions from the data. They also compare favorably against their competitors across multiple synthetic data scenarios. CONCLUSION: The proposed HDMR-based approach appears to produce a reliable classifier that additionally allows one to describe how individual genes or gene-gene interactions affect classification decisions. Both real and synthetic data analyses suggest that our methods can be used to identify gene networks with dysregulated pairwise interactions, and are therefore appropriate for differential networks analysis.
format	Online Article Text
id	pubmed-7183128
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-71831282020-04-28 High dimensional model representation of log-likelihood ratio: binary classification with expression data Foroughi pour, Ali Pietrzak, Maciej Dalton, Lori A Rempała, Grzegorz A. BMC Bioinformatics Methodology Article BACKGROUND: Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene interactions, and (b) the need for highly interpretable glass-box models. We use the theory of high dimensional model representation (HDMR) to build interpretable low dimensional approximations of the log-likelihood ratio accounting for the effects of each individual gene as well as gene-gene interactions. We propose two algorithms approximating the second order HDMR expansion, and a hypothesis test based on the HDMR formulation to identify significantly dysregulated pairwise interactions. The theory is seen as flexible and requiring only a mild set of assumptions. RESULTS: We apply our approach to gene expression data from both synthetic and real (breast and lung cancer) datasets comparing it also against several popular state-of-the-art methods. The analyses suggest the proposed algorithms can be used to obtain interpretable prediction rules with high prediction accuracies and to successfully extract significantly dysregulated gene-gene interactions from the data. They also compare favorably against their competitors across multiple synthetic data scenarios. CONCLUSION: The proposed HDMR-based approach appears to produce a reliable classifier that additionally allows one to describe how individual genes or gene-gene interactions affect classification decisions. Both real and synthetic data analyses suggest that our methods can be used to identify gene networks with dysregulated pairwise interactions, and are therefore appropriate for differential networks analysis. BioMed Central 2020-04-25 /pmc/articles/PMC7183128/ /pubmed/32334509 http://dx.doi.org/10.1186/s12859-020-3486-x Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Foroughi pour, Ali Pietrzak, Maciej Dalton, Lori A Rempała, Grzegorz A. High dimensional model representation of log-likelihood ratio: binary classification with expression data
title	High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_full	High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_fullStr	High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_full_unstemmed	High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_short	High dimensional model representation of log-likelihood ratio: binary classification with expression data
title_sort	high dimensional model representation of log-likelihood ratio: binary classification with expression data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7183128/ https://www.ncbi.nlm.nih.gov/pubmed/32334509 http://dx.doi.org/10.1186/s12859-020-3486-x
work_keys_str_mv	AT foroughipourali highdimensionalmodelrepresentationofloglikelihoodratiobinaryclassificationwithexpressiondata AT pietrzakmaciej highdimensionalmodelrepresentationofloglikelihoodratiobinaryclassificationwithexpressiondata AT daltonloria highdimensionalmodelrepresentationofloglikelihoodratiobinaryclassificationwithexpressiondata AT rempałagrzegorza highdimensionalmodelrepresentationofloglikelihoodratiobinaryclassificationwithexpressiondata

High dimensional model representation of log-likelihood ratio: binary classification with expression data

Ejemplares similares