Cargando…

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an exp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Geddes, Thomas A., Kim, Taiyun, Nan, Lihao, Burchfield, James G., Yang, Jean Y. H., Tao, Dacheng, Yang, Pengyi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929272/ https://www.ncbi.nlm.nih.gov/pubmed/31870278 http://dx.doi.org/10.1186/s12859-019-3179-5

_version_	1783482666194042880
author	Geddes, Thomas A. Kim, Taiyun Nan, Lihao Burchfield, James G. Yang, Jean Y. H. Tao, Dacheng Yang, Pengyi
author_facet	Geddes, Thomas A. Kim, Taiyun Nan, Lihao Burchfield, James G. Yang, Jean Y. H. Tao, Dacheng Yang, Pengyi
author_sort	Geddes, Thomas A.
collection	PubMed
description	BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. RESULTS: Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. CONCLUSIONS: Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS
format	Online Article Text
id	pubmed-6929272
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-69292722019-12-30 Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis Geddes, Thomas A. Kim, Taiyun Nan, Lihao Burchfield, James G. Yang, Jean Y. H. Tao, Dacheng Yang, Pengyi BMC Bioinformatics Research BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. RESULTS: Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. CONCLUSIONS: Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS BioMed Central 2019-12-24 /pmc/articles/PMC6929272/ /pubmed/31870278 http://dx.doi.org/10.1186/s12859-019-3179-5 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Geddes, Thomas A. Kim, Taiyun Nan, Lihao Burchfield, James G. Yang, Jean Y. H. Tao, Dacheng Yang, Pengyi Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title	Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_full	Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_fullStr	Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_full_unstemmed	Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_short	Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_sort	autoencoder-based cluster ensembles for single-cell rna-seq data analysis
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929272/ https://www.ncbi.nlm.nih.gov/pubmed/31870278 http://dx.doi.org/10.1186/s12859-019-3179-5
work_keys_str_mv	AT geddesthomasa autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis AT kimtaiyun autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis AT nanlihao autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis AT burchfieldjamesg autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis AT yangjeanyh autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis AT taodacheng autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis AT yangpengyi autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis

Ejemplares similares