Cargando…

Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data

The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an import...

Descripción completa

Detalles Bibliográficos
Autores principales: Seth , Soumita, Mallik , Saurav, Bhadra , Tapas, Zhao , Zhongming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8859265/
https://www.ncbi.nlm.nih.gov/pubmed/35198011
http://dx.doi.org/10.3389/fgene.2022.828479
_version_ 1784654418263670784
author Seth , Soumita
Mallik , Saurav
Bhadra , Tapas
Zhao , Zhongming
author_facet Seth , Soumita
Mallik , Saurav
Bhadra , Tapas
Zhao , Zhongming
author_sort Seth , Soumita
collection PubMed
description The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a   log (2) FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.
format Online
Article
Text
id pubmed-8859265
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-88592652022-02-22 Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data Seth , Soumita Mallik , Saurav Bhadra , Tapas Zhao , Zhongming Front Genet Genetics The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a   log (2) FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data. Frontiers Media S.A. 2022-02-07 /pmc/articles/PMC8859265/ /pubmed/35198011 http://dx.doi.org/10.3389/fgene.2022.828479 Text en Copyright © 2022 Seth , Mallik , Bhadra  and Zhao . https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Seth , Soumita
Mallik , Saurav
Bhadra , Tapas
Zhao , Zhongming
Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data
title Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data
title_full Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data
title_fullStr Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data
title_full_unstemmed Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data
title_short Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data
title_sort dimensionality reduction and louvain agglomerative hierarchical clustering for cluster-specified frequent biomarker discovery in single-cell sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8859265/
https://www.ncbi.nlm.nih.gov/pubmed/35198011
http://dx.doi.org/10.3389/fgene.2022.828479
work_keys_str_mv AT sethsoumita dimensionalityreductionandlouvainagglomerativehierarchicalclusteringforclusterspecifiedfrequentbiomarkerdiscoveryinsinglecellsequencingdata
AT malliksaurav dimensionalityreductionandlouvainagglomerativehierarchicalclusteringforclusterspecifiedfrequentbiomarkerdiscoveryinsinglecellsequencingdata
AT bhadratapas dimensionalityreductionandlouvainagglomerativehierarchicalclusteringforclusterspecifiedfrequentbiomarkerdiscoveryinsinglecellsequencingdata
AT zhaozhongming dimensionalityreductionandlouvainagglomerativehierarchicalclusteringforclusterspecifiedfrequentbiomarkerdiscoveryinsinglecellsequencingdata