Cargando…

RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data

Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes)...

Descripción completa

Detalles Bibliográficos
Autores principales: Lall, Snehalika, Ray, Sumanta, Bandyopadhyay, Sanghamitra
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8568278/
https://www.ncbi.nlm.nih.gov/pubmed/34665808
http://dx.doi.org/10.1371/journal.pcbi.1009464
_version_ 1784594405303255040
author Lall, Snehalika
Ray, Sumanta
Bandyopadhyay, Sanghamitra
author_facet Lall, Snehalika
Ray, Sumanta
Bandyopadhyay, Sanghamitra
author_sort Lall, Snehalika
collection PubMed
description Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes) failed to provide stable and predictive feature set due to technical noise present in the data. Here, we propose RgCop, a novel regularized copula based method for gene selection from large single cell RNA-seq data. RgCop utilizes copula correlation (Ccor), a robust equitable dependence measure that captures multivariate dependency among a set of genes in single cell expression data. We formulate an objective function by adding l(1) regularization term with Ccor to penalizes the redundant co-efficient of features/genes, resulting non-redundant effective features/genes set. Results show a significant improvement in the clustering/classification performance of real life scRNA-seq data over the other state-of-the-art. RgCop performs extremely well in capturing dependence among the features of noisy data due to the scale invariant property of copula, thereby improving the stability of the method. Moreover, the differentially expressed (DE) genes identified from the clusters of scRNA-seq data are found to provide an accurate annotation of cells. Finally, the features/genes obtained from RgCop is able to annotate the unknown cells with high accuracy.
format Online
Article
Text
id pubmed-8568278
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-85682782021-11-05 RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data Lall, Snehalika Ray, Sumanta Bandyopadhyay, Sanghamitra PLoS Comput Biol Research Article Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes) failed to provide stable and predictive feature set due to technical noise present in the data. Here, we propose RgCop, a novel regularized copula based method for gene selection from large single cell RNA-seq data. RgCop utilizes copula correlation (Ccor), a robust equitable dependence measure that captures multivariate dependency among a set of genes in single cell expression data. We formulate an objective function by adding l(1) regularization term with Ccor to penalizes the redundant co-efficient of features/genes, resulting non-redundant effective features/genes set. Results show a significant improvement in the clustering/classification performance of real life scRNA-seq data over the other state-of-the-art. RgCop performs extremely well in capturing dependence among the features of noisy data due to the scale invariant property of copula, thereby improving the stability of the method. Moreover, the differentially expressed (DE) genes identified from the clusters of scRNA-seq data are found to provide an accurate annotation of cells. Finally, the features/genes obtained from RgCop is able to annotate the unknown cells with high accuracy. Public Library of Science 2021-10-19 /pmc/articles/PMC8568278/ /pubmed/34665808 http://dx.doi.org/10.1371/journal.pcbi.1009464 Text en © 2021 Lall et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lall, Snehalika
Ray, Sumanta
Bandyopadhyay, Sanghamitra
RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data
title RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data
title_full RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data
title_fullStr RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data
title_full_unstemmed RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data
title_short RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data
title_sort rgcop-a regularized copula based method for gene selection in single-cell rna-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8568278/
https://www.ncbi.nlm.nih.gov/pubmed/34665808
http://dx.doi.org/10.1371/journal.pcbi.1009464
work_keys_str_mv AT lallsnehalika rgcoparegularizedcopulabasedmethodforgeneselectioninsinglecellrnaseqdata
AT raysumanta rgcoparegularizedcopulabasedmethodforgeneselectioninsinglecellrnaseqdata
AT bandyopadhyaysanghamitra rgcoparegularizedcopulabasedmethodforgeneselectioninsinglecellrnaseqdata