Cargando…
A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data †
SIMPLE SUMMARY: Single Cell RNA Sequencing generates gene expression data at a single cell resolution. While single cell RNA has many applications in biomedical research, the high dimensionality of the data produced poses a considerable computational challenge. This study proposes a novel algorithm...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9598401/ https://www.ncbi.nlm.nih.gov/pubmed/36290397 http://dx.doi.org/10.3390/biology11101495 |
_version_ | 1784816324911824896 |
---|---|
author | Sen Puliparambil, Bhavithry Tomal, Jabed H. Yan, Yan |
author_facet | Sen Puliparambil, Bhavithry Tomal, Jabed H. Yan, Yan |
author_sort | Sen Puliparambil, Bhavithry |
collection | PubMed |
description | SIMPLE SUMMARY: Single Cell RNA Sequencing generates gene expression data at a single cell resolution. While single cell RNA has many applications in biomedical research, the high dimensionality of the data produced poses a considerable computational challenge. This study proposes a novel algorithm using penalized regression methods to analyze single cell RNA sequencing data. The proposed algorithm reduces high dimensionality of the gene expression data using a sequence feature selection methods such as Ridge regression, LASSO, Elastic Net, Drop LASSO, and Sparse Group LASSO. The proposed algorithm successfully detected highly differentiated genes, including the marker genes, for 5 different single cell RNA sequencing datasets associated with the species mouse, plant, and human. ABSTRACT: With the emergence of single-cell RNA sequencing (scRNA-seq) technology, scientists are able to examine gene expression at single-cell resolution. Analysis of scRNA-seq data has its own challenges, which stem from its high dimensionality. The method of machine learning comes with the potential of gene (feature) selection from the high-dimensional scRNA-seq data. Even though there exist multiple machine learning methods that appear to be suitable for feature selection, such as penalized regression, there is no rigorous comparison of their performances across data sets, where each poses its own challenges. Therefore, in this paper, we analyzed and compared multiple penalized regression methods for scRNA-seq data. Given the scRNA-seq data sets we analyzed, the results show that sparse group lasso (SGL) outperforms the other six methods (ridge, lasso, elastic net, drop lasso, group lasso, and big lasso) using the metrics area under the receiver operating curve (AUC) and computation time. Building on these findings, we proposed a new algorithm for feature selection using penalized regression methods. The proposed algorithm works by selecting a small subset of genes and applying SGL to select the differentially expressed genes in scRNA-seq data. By using hierarchical clustering to group genes, the proposed method bypasses the need for domain-specific knowledge for gene grouping information. In addition, the proposed algorithm provided consistently better AUC for the data sets used. |
format | Online Article Text |
id | pubmed-9598401 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-95984012022-10-27 A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data † Sen Puliparambil, Bhavithry Tomal, Jabed H. Yan, Yan Biology (Basel) Article SIMPLE SUMMARY: Single Cell RNA Sequencing generates gene expression data at a single cell resolution. While single cell RNA has many applications in biomedical research, the high dimensionality of the data produced poses a considerable computational challenge. This study proposes a novel algorithm using penalized regression methods to analyze single cell RNA sequencing data. The proposed algorithm reduces high dimensionality of the gene expression data using a sequence feature selection methods such as Ridge regression, LASSO, Elastic Net, Drop LASSO, and Sparse Group LASSO. The proposed algorithm successfully detected highly differentiated genes, including the marker genes, for 5 different single cell RNA sequencing datasets associated with the species mouse, plant, and human. ABSTRACT: With the emergence of single-cell RNA sequencing (scRNA-seq) technology, scientists are able to examine gene expression at single-cell resolution. Analysis of scRNA-seq data has its own challenges, which stem from its high dimensionality. The method of machine learning comes with the potential of gene (feature) selection from the high-dimensional scRNA-seq data. Even though there exist multiple machine learning methods that appear to be suitable for feature selection, such as penalized regression, there is no rigorous comparison of their performances across data sets, where each poses its own challenges. Therefore, in this paper, we analyzed and compared multiple penalized regression methods for scRNA-seq data. Given the scRNA-seq data sets we analyzed, the results show that sparse group lasso (SGL) outperforms the other six methods (ridge, lasso, elastic net, drop lasso, group lasso, and big lasso) using the metrics area under the receiver operating curve (AUC) and computation time. Building on these findings, we proposed a new algorithm for feature selection using penalized regression methods. The proposed algorithm works by selecting a small subset of genes and applying SGL to select the differentially expressed genes in scRNA-seq data. By using hierarchical clustering to group genes, the proposed method bypasses the need for domain-specific knowledge for gene grouping information. In addition, the proposed algorithm provided consistently better AUC for the data sets used. MDPI 2022-10-12 /pmc/articles/PMC9598401/ /pubmed/36290397 http://dx.doi.org/10.3390/biology11101495 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Sen Puliparambil, Bhavithry Tomal, Jabed H. Yan, Yan A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data † |
title | A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data † |
title_full | A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data † |
title_fullStr | A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data † |
title_full_unstemmed | A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data † |
title_short | A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data † |
title_sort | novel algorithm for feature selection using penalized regression with applications to single-cell rna sequencing data † |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9598401/ https://www.ncbi.nlm.nih.gov/pubmed/36290397 http://dx.doi.org/10.3390/biology11101495 |
work_keys_str_mv | AT senpuliparambilbhavithry anovelalgorithmforfeatureselectionusingpenalizedregressionwithapplicationstosinglecellrnasequencingdata AT tomaljabedh anovelalgorithmforfeatureselectionusingpenalizedregressionwithapplicationstosinglecellrnasequencingdata AT yanyan anovelalgorithmforfeatureselectionusingpenalizedregressionwithapplicationstosinglecellrnasequencingdata AT senpuliparambilbhavithry novelalgorithmforfeatureselectionusingpenalizedregressionwithapplicationstosinglecellrnasequencingdata AT tomaljabedh novelalgorithmforfeatureselectionusingpenalizedregressionwithapplicationstosinglecellrnasequencingdata AT yanyan novelalgorithmforfeatureselectionusingpenalizedregressionwithapplicationstosinglecellrnasequencingdata |