Cargando…

SAIC: an iterative clustering approach for analysis of single cell RNA-seq data

BACKGROUND: Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. An important step in the...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Lu, Liu, Jiancheng, Lu, Qiang, Riggs, Arthur D., Wu, Xiwei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5629617/
https://www.ncbi.nlm.nih.gov/pubmed/28984204
http://dx.doi.org/10.1186/s12864-017-4019-5
_version_ 1783269080806981632
author Yang, Lu
Liu, Jiancheng
Lu, Qiang
Riggs, Arthur D.
Wu, Xiwei
author_facet Yang, Lu
Liu, Jiancheng
Lu, Qiang
Riggs, Arthur D.
Wu, Xiwei
author_sort Yang, Lu
collection PubMed
description BACKGROUND: Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. An important step in the single-cell transcriptome analysis is to identify distinct cell groups that have different gene expression patterns. Currently there are limited bioinformatics approaches available for single-cell RNA-seq analysis. Many studies rely on principal component analysis (PCA) with arbitrary parameters to identify the genes that will be used to cluster the single cells. RESULTS: We have developed a novel algorithm, called SAIC (Single cell Analysis via Iterative Clustering), that identifies the optimal set of signature genes to separate single cells into distinct groups. Our method utilizes an iterative clustering approach to perform an exhaustive search for the best parameters within the search space, which is defined by a number of initial centers and P values. The end point is identification of a signature gene set that gives the best separation of the cell clusters. Using a simulated data set, we showed that SAIC can successfully identify the pre-defined signature gene sets that can correctly separated the cells into predefined clusters. We applied SAIC to two published single cell RNA-seq datasets. For both datasets, SAIC was able to identify a subset of signature genes that can cluster the single cells into groups that are consistent with the published results. The signature genes identified by SAIC resulted in better clusters of cells based on DB index score, and many genes also showed tissue specific expression. CONCLUSIONS: In summary, we have developed an efficient algorithm to identify the optimal subset of genes that separate single cells into distinct clusters based on their expression patterns. We have shown that it performs better than PCA method using published single cell RNA-seq datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4019-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5629617
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56296172017-10-13 SAIC: an iterative clustering approach for analysis of single cell RNA-seq data Yang, Lu Liu, Jiancheng Lu, Qiang Riggs, Arthur D. Wu, Xiwei BMC Genomics Research BACKGROUND: Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. An important step in the single-cell transcriptome analysis is to identify distinct cell groups that have different gene expression patterns. Currently there are limited bioinformatics approaches available for single-cell RNA-seq analysis. Many studies rely on principal component analysis (PCA) with arbitrary parameters to identify the genes that will be used to cluster the single cells. RESULTS: We have developed a novel algorithm, called SAIC (Single cell Analysis via Iterative Clustering), that identifies the optimal set of signature genes to separate single cells into distinct groups. Our method utilizes an iterative clustering approach to perform an exhaustive search for the best parameters within the search space, which is defined by a number of initial centers and P values. The end point is identification of a signature gene set that gives the best separation of the cell clusters. Using a simulated data set, we showed that SAIC can successfully identify the pre-defined signature gene sets that can correctly separated the cells into predefined clusters. We applied SAIC to two published single cell RNA-seq datasets. For both datasets, SAIC was able to identify a subset of signature genes that can cluster the single cells into groups that are consistent with the published results. The signature genes identified by SAIC resulted in better clusters of cells based on DB index score, and many genes also showed tissue specific expression. CONCLUSIONS: In summary, we have developed an efficient algorithm to identify the optimal subset of genes that separate single cells into distinct clusters based on their expression patterns. We have shown that it performs better than PCA method using published single cell RNA-seq datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4019-5) contains supplementary material, which is available to authorized users. BioMed Central 2017-10-03 /pmc/articles/PMC5629617/ /pubmed/28984204 http://dx.doi.org/10.1186/s12864-017-4019-5 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yang, Lu
Liu, Jiancheng
Lu, Qiang
Riggs, Arthur D.
Wu, Xiwei
SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_full SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_fullStr SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_full_unstemmed SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_short SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
title_sort saic: an iterative clustering approach for analysis of single cell rna-seq data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5629617/
https://www.ncbi.nlm.nih.gov/pubmed/28984204
http://dx.doi.org/10.1186/s12864-017-4019-5
work_keys_str_mv AT yanglu saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata
AT liujiancheng saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata
AT luqiang saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata
AT riggsarthurd saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata
AT wuxiwei saicaniterativeclusteringapproachforanalysisofsinglecellrnaseqdata