Cargando…

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

Single-cell RNA sequencing technologies have enabled us to study tissue heterogeneity at cellular resolution. Fast-developing sequencing platforms like droplet-based sequencing make it feasible to parallel process thousands of single cells effectively. Although a unique molecular identifier (UMI) ca...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Liang, Wang, Weinan, Zhai, Yuyao, Deng, Minghua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7180207/
https://www.ncbi.nlm.nih.gov/pubmed/32362908
http://dx.doi.org/10.3389/fgene.2020.00295
_version_ 1783525775854534656
author Chen, Liang
Wang, Weinan
Zhai, Yuyao
Deng, Minghua
author_facet Chen, Liang
Wang, Weinan
Zhai, Yuyao
Deng, Minghua
author_sort Chen, Liang
collection PubMed
description Single-cell RNA sequencing technologies have enabled us to study tissue heterogeneity at cellular resolution. Fast-developing sequencing platforms like droplet-based sequencing make it feasible to parallel process thousands of single cells effectively. Although a unique molecular identifier (UMI) can remove bias from amplification noise to a certain extent, clustering for such sparse and high-dimensional large-scale discrete data remains intractable and challenging. Most existing deep learning-based clustering methods utilize the mean square error or negative binomial distribution with or without zero inflation to denoise single-cell UMI count data, which may underfit or overfit the gene expression profiles. In addition, neglecting the molecule sampling mechanism and extracting representation by simple linear dimension reduction with a hard clustering algorithm may distort data structure and lead to spurious analytical results. In this paper, we combined the deep autoencoder technique with statistical modeling and developed a novel and effective clustering method, scDMFK, for single-cell transcriptome UMI count data. ScDMFK utilizes multinomial distribution to characterize data structure and draw support from neural network to facilitate model parameter estimation. In the learned low-dimensional latent space, we proposed an adaptive fuzzy k-means algorithm with entropy regularization to perform soft clustering. Various simulation scenarios and the analysis of 10 real datasets have shown that scDMFK outperforms other state-of-the-art methods with respect to data modeling and clustering algorithms. Besides, scDMFK has excellent scalability for large-scale single-cell datasets.
format Online
Article
Text
id pubmed-7180207
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-71802072020-05-01 Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm Chen, Liang Wang, Weinan Zhai, Yuyao Deng, Minghua Front Genet Genetics Single-cell RNA sequencing technologies have enabled us to study tissue heterogeneity at cellular resolution. Fast-developing sequencing platforms like droplet-based sequencing make it feasible to parallel process thousands of single cells effectively. Although a unique molecular identifier (UMI) can remove bias from amplification noise to a certain extent, clustering for such sparse and high-dimensional large-scale discrete data remains intractable and challenging. Most existing deep learning-based clustering methods utilize the mean square error or negative binomial distribution with or without zero inflation to denoise single-cell UMI count data, which may underfit or overfit the gene expression profiles. In addition, neglecting the molecule sampling mechanism and extracting representation by simple linear dimension reduction with a hard clustering algorithm may distort data structure and lead to spurious analytical results. In this paper, we combined the deep autoencoder technique with statistical modeling and developed a novel and effective clustering method, scDMFK, for single-cell transcriptome UMI count data. ScDMFK utilizes multinomial distribution to characterize data structure and draw support from neural network to facilitate model parameter estimation. In the learned low-dimensional latent space, we proposed an adaptive fuzzy k-means algorithm with entropy regularization to perform soft clustering. Various simulation scenarios and the analysis of 10 real datasets have shown that scDMFK outperforms other state-of-the-art methods with respect to data modeling and clustering algorithms. Besides, scDMFK has excellent scalability for large-scale single-cell datasets. Frontiers Media S.A. 2020-04-17 /pmc/articles/PMC7180207/ /pubmed/32362908 http://dx.doi.org/10.3389/fgene.2020.00295 Text en Copyright © 2020 Chen, Wang, Zhai and Deng. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Chen, Liang
Wang, Weinan
Zhai, Yuyao
Deng, Minghua
Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm
title Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm
title_full Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm
title_fullStr Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm
title_full_unstemmed Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm
title_short Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm
title_sort single-cell transcriptome data clustering via multinomial modeling and adaptive fuzzy k-means algorithm
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7180207/
https://www.ncbi.nlm.nih.gov/pubmed/32362908
http://dx.doi.org/10.3389/fgene.2020.00295
work_keys_str_mv AT chenliang singlecelltranscriptomedataclusteringviamultinomialmodelingandadaptivefuzzykmeansalgorithm
AT wangweinan singlecelltranscriptomedataclusteringviamultinomialmodelingandadaptivefuzzykmeansalgorithm
AT zhaiyuyao singlecelltranscriptomedataclusteringviamultinomialmodelingandadaptivefuzzykmeansalgorithm
AT dengminghua singlecelltranscriptomedataclusteringviamultinomialmodelingandadaptivefuzzykmeansalgorithm