Cargando…

Chronic Obstructive Pulmonary Disease: Novel Genes Detection with Penalized Logistic Regression

OBJECTIVE: This study aimed to introduce novel techniques for identifying the genes associated with developing chronic obstructive pulmonary disease (COPD) and to prioritize COPD candidate genes using regression methods. MATERIALS AND METHODS: This is a secondary analysis of the data from an experim...

Descripción completa

Detalles Bibliográficos
Autores principales: Gohari, Kimiya, Kazemnejad, Anoshirvan, Mostafaei, Shayan, Saberi, Samaneh, Sheidaei, Ali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Royan Institute 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10105299/
https://www.ncbi.nlm.nih.gov/pubmed/37038700
http://dx.doi.org/10.22074/CELLJ.2022.557389.1048
_version_ 1785026181358157824
author Gohari, Kimiya
Kazemnejad, Anoshirvan
Mostafaei, Shayan
Saberi, Samaneh
Sheidaei, Ali
author_facet Gohari, Kimiya
Kazemnejad, Anoshirvan
Mostafaei, Shayan
Saberi, Samaneh
Sheidaei, Ali
author_sort Gohari, Kimiya
collection PubMed
description OBJECTIVE: This study aimed to introduce novel techniques for identifying the genes associated with developing chronic obstructive pulmonary disease (COPD) and to prioritize COPD candidate genes using regression methods. MATERIALS AND METHODS: This is a secondary analysis of the data from an experimental study. We used penalized logistic regressions with three different types of penalties included least absolute shrinkage and selection operator (LASSO), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD). The models were trained using genome-wide expression profiling to define gene networks relevant to the COPD stages. A 10-fold cross-validation scheme was used to evaluate the performance of the methods. In addition, we validate our results by the external validity approach. We reported the sensitivity, specificity, and area under curve (AUC) of the models. RESULTS: There were 21, 22, and 18 significantly associated genes for LASSO, SCAD, and MCP models, respectively. The most statistically conservative method (detecting less significant features) was MCP detected 18 genes that were all detected by the other two approaches. The most appropriate approach was a SCAD penalized logistic regression (AUC= 96.26, sensitivity= 94.2, specificity= 86.96). In this study, we have a common panel of 18 genes in all three models that show a significant positive and negative correlation with COPD, in which RNF130, STX6, PLCB1, CACNA1G, LARP4B, LOC100507634, SLC38A2, and STIM2 showed the odds ratio (OR) more than 1. However, there was a slight difference between penalized methods. CONCLUSION: Regularization solves the serious dimensionality problem in using this kind of regression. More exploration of how these genes affect the outcome and mechanism is possible more quickly in this manner. The regression-based approaches we present could apply to overcoming this issue.
format Online
Article
Text
id pubmed-10105299
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Royan Institute
record_format MEDLINE/PubMed
spelling pubmed-101052992023-04-16 Chronic Obstructive Pulmonary Disease: Novel Genes Detection with Penalized Logistic Regression Gohari, Kimiya Kazemnejad, Anoshirvan Mostafaei, Shayan Saberi, Samaneh Sheidaei, Ali Cell J Original Article OBJECTIVE: This study aimed to introduce novel techniques for identifying the genes associated with developing chronic obstructive pulmonary disease (COPD) and to prioritize COPD candidate genes using regression methods. MATERIALS AND METHODS: This is a secondary analysis of the data from an experimental study. We used penalized logistic regressions with three different types of penalties included least absolute shrinkage and selection operator (LASSO), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD). The models were trained using genome-wide expression profiling to define gene networks relevant to the COPD stages. A 10-fold cross-validation scheme was used to evaluate the performance of the methods. In addition, we validate our results by the external validity approach. We reported the sensitivity, specificity, and area under curve (AUC) of the models. RESULTS: There were 21, 22, and 18 significantly associated genes for LASSO, SCAD, and MCP models, respectively. The most statistically conservative method (detecting less significant features) was MCP detected 18 genes that were all detected by the other two approaches. The most appropriate approach was a SCAD penalized logistic regression (AUC= 96.26, sensitivity= 94.2, specificity= 86.96). In this study, we have a common panel of 18 genes in all three models that show a significant positive and negative correlation with COPD, in which RNF130, STX6, PLCB1, CACNA1G, LARP4B, LOC100507634, SLC38A2, and STIM2 showed the odds ratio (OR) more than 1. However, there was a slight difference between penalized methods. CONCLUSION: Regularization solves the serious dimensionality problem in using this kind of regression. More exploration of how these genes affect the outcome and mechanism is possible more quickly in this manner. The regression-based approaches we present could apply to overcoming this issue. Royan Institute 2023-03 2023-03-07 /pmc/articles/PMC10105299/ /pubmed/37038700 http://dx.doi.org/10.22074/CELLJ.2022.557389.1048 Text en Any use, distribution, reproduction or abstract of this publication in any medium, with the exception of commercial purposes, is permitted provided the original work is properly cited. https://creativecommons.org/licenses/by-nc/3.0/This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial 3.0 (CC BY-NC 3.0) License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Gohari, Kimiya
Kazemnejad, Anoshirvan
Mostafaei, Shayan
Saberi, Samaneh
Sheidaei, Ali
Chronic Obstructive Pulmonary Disease: Novel Genes Detection with Penalized Logistic Regression
title Chronic Obstructive Pulmonary Disease: Novel Genes Detection with Penalized Logistic Regression
title_full Chronic Obstructive Pulmonary Disease: Novel Genes Detection with Penalized Logistic Regression
title_fullStr Chronic Obstructive Pulmonary Disease: Novel Genes Detection with Penalized Logistic Regression
title_full_unstemmed Chronic Obstructive Pulmonary Disease: Novel Genes Detection with Penalized Logistic Regression
title_short Chronic Obstructive Pulmonary Disease: Novel Genes Detection with Penalized Logistic Regression
title_sort chronic obstructive pulmonary disease: novel genes detection with penalized logistic regression
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10105299/
https://www.ncbi.nlm.nih.gov/pubmed/37038700
http://dx.doi.org/10.22074/CELLJ.2022.557389.1048
work_keys_str_mv AT goharikimiya chronicobstructivepulmonarydiseasenovelgenesdetectionwithpenalizedlogisticregression
AT kazemnejadanoshirvan chronicobstructivepulmonarydiseasenovelgenesdetectionwithpenalizedlogisticregression
AT mostafaeishayan chronicobstructivepulmonarydiseasenovelgenesdetectionwithpenalizedlogisticregression
AT saberisamaneh chronicobstructivepulmonarydiseasenovelgenesdetectionwithpenalizedlogisticregression
AT sheidaeiali chronicobstructivepulmonarydiseasenovelgenesdetectionwithpenalizedlogisticregression