Cargando…

Identification of gene pairs through penalized regression subject to constraints

BACKGROUND: This article concerns the identification of gene pairs or combinations of gene pairs associated with biological phenotype or clinical outcome, allowing for building predictive models that are not only robust to normalization but also easily validated and measured by qPCR techniques. Howe...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Rex, Luo, Lan, Jiang, Hui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5670721/
https://www.ncbi.nlm.nih.gov/pubmed/29100492
http://dx.doi.org/10.1186/s12859-017-1872-9
_version_ 1783276093199876096
author Shen, Rex
Luo, Lan
Jiang, Hui
author_facet Shen, Rex
Luo, Lan
Jiang, Hui
author_sort Shen, Rex
collection PubMed
description BACKGROUND: This article concerns the identification of gene pairs or combinations of gene pairs associated with biological phenotype or clinical outcome, allowing for building predictive models that are not only robust to normalization but also easily validated and measured by qPCR techniques. However, given a small number of biological samples yet a large number of genes, this problem suffers from the difficulty of high computational complexity and imposes challenges to the accuracy of identification statistically. RESULTS: In this paper, we propose a parsimonious model representation and develop efficient algorithms for identification. Particularly, we derive an equivalent model subject to a sum-to-zero constraint in penalized linear regression, where the correspondence between nonzero coefficients in these models is established. Most importantly, it reduces the model complexity of the traditional approach from the quadratic order to the linear order in the number of candidate genes, while overcoming the difficulty of model nonidentifiablity. Computationally, we develop an algorithm using the alternating direction method of multipliers (ADMM) to deal with the constraint. Numerically, we demonstrate that the proposed method outperforms the traditional method in terms of the statistical accuracy. Moreover, we demonstrate that our ADMM algorithm is more computationally efficient than a coordinate descent algorithm with a local search. Finally, we illustrate the proposed method on a prostate cancer dataset to identify gene pairs that are associated with pre-operative prostate-specific antigen. CONCLUSION: Our findings demonstrate the feasibility and utility of using gene pairs as biomarkers.
format Online
Article
Text
id pubmed-5670721
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56707212017-11-15 Identification of gene pairs through penalized regression subject to constraints Shen, Rex Luo, Lan Jiang, Hui BMC Bioinformatics Methodology Article BACKGROUND: This article concerns the identification of gene pairs or combinations of gene pairs associated with biological phenotype or clinical outcome, allowing for building predictive models that are not only robust to normalization but also easily validated and measured by qPCR techniques. However, given a small number of biological samples yet a large number of genes, this problem suffers from the difficulty of high computational complexity and imposes challenges to the accuracy of identification statistically. RESULTS: In this paper, we propose a parsimonious model representation and develop efficient algorithms for identification. Particularly, we derive an equivalent model subject to a sum-to-zero constraint in penalized linear regression, where the correspondence between nonzero coefficients in these models is established. Most importantly, it reduces the model complexity of the traditional approach from the quadratic order to the linear order in the number of candidate genes, while overcoming the difficulty of model nonidentifiablity. Computationally, we develop an algorithm using the alternating direction method of multipliers (ADMM) to deal with the constraint. Numerically, we demonstrate that the proposed method outperforms the traditional method in terms of the statistical accuracy. Moreover, we demonstrate that our ADMM algorithm is more computationally efficient than a coordinate descent algorithm with a local search. Finally, we illustrate the proposed method on a prostate cancer dataset to identify gene pairs that are associated with pre-operative prostate-specific antigen. CONCLUSION: Our findings demonstrate the feasibility and utility of using gene pairs as biomarkers. BioMed Central 2017-11-03 /pmc/articles/PMC5670721/ /pubmed/29100492 http://dx.doi.org/10.1186/s12859-017-1872-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Shen, Rex
Luo, Lan
Jiang, Hui
Identification of gene pairs through penalized regression subject to constraints
title Identification of gene pairs through penalized regression subject to constraints
title_full Identification of gene pairs through penalized regression subject to constraints
title_fullStr Identification of gene pairs through penalized regression subject to constraints
title_full_unstemmed Identification of gene pairs through penalized regression subject to constraints
title_short Identification of gene pairs through penalized regression subject to constraints
title_sort identification of gene pairs through penalized regression subject to constraints
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5670721/
https://www.ncbi.nlm.nih.gov/pubmed/29100492
http://dx.doi.org/10.1186/s12859-017-1872-9
work_keys_str_mv AT shenrex identificationofgenepairsthroughpenalizedregressionsubjecttoconstraints
AT luolan identificationofgenepairsthroughpenalizedregressionsubjecttoconstraints
AT jianghui identificationofgenepairsthroughpenalizedregressionsubjecttoconstraints