Cargando…

Detection of high variability in gene expression from single-cell RNA-seq profiling

BACKGROUND: The advancement of the next-generation sequencing technology enables mapping gene expression at the single-cell level, capable of tracking cell heterogeneity and determination of cell subpopulations using single-cell RNA sequencing (scRNA-seq). Unlike the objectives of conventional RNA-s...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Hung-I Harry, Jin, Yufang, Huang, Yufei, Chen, Yidong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001205/
https://www.ncbi.nlm.nih.gov/pubmed/27556924
http://dx.doi.org/10.1186/s12864-016-2897-6
_version_ 1782450429888561152
author Chen, Hung-I Harry
Jin, Yufang
Huang, Yufei
Chen, Yidong
author_facet Chen, Hung-I Harry
Jin, Yufang
Huang, Yufei
Chen, Yidong
author_sort Chen, Hung-I Harry
collection PubMed
description BACKGROUND: The advancement of the next-generation sequencing technology enables mapping gene expression at the single-cell level, capable of tracking cell heterogeneity and determination of cell subpopulations using single-cell RNA sequencing (scRNA-seq). Unlike the objectives of conventional RNA-seq where differential expression analysis is the integral component, the most important goal of scRNA-seq is to identify highly variable genes across a population of cells, to account for the discrete nature of single-cell gene expression and uniqueness of sequencing library preparation protocol for single-cell sequencing. However, there is lack of generic expression variation model for different scRNA-seq data sets. Hence, the objective of this study is to develop a gene expression variation model (GEVM), utilizing the relationship between coefficient of variation (CV) and average expression level to address the over-dispersion of single-cell data, and its corresponding statistical significance to quantify the variably expressed genes (VEGs). RESULTS: We have built a simulation framework that generated scRNA-seq data with different number of cells, model parameters, and variation levels. We implemented our GEVM and demonstrated the robustness by using a set of simulated scRNA-seq data under different conditions. We evaluated the regression robustness using root-mean-square error (RMSE) and assessed the parameter estimation process by varying initial model parameters that deviated from homogeneous cell population. We also applied the GEVM on real scRNA-seq data to test the performance under distinct cases. CONCLUSIONS: In this paper, we proposed a gene expression variation model that can be used to determine significant variably expressed genes. Applying the model to the simulated single-cell data, we observed robust parameter estimation under different conditions with minimal root mean square errors. We also examined the model on two distinct scRNA-seq data sets using different single-cell protocols and determined the VEGs. Obtaining VEGs allowed us to observe possible subpopulations, providing further evidences of cell heterogeneity. With the GEVM, we can easily find out significant variably expressed genes in different scRNA-seq data sets.
format Online
Article
Text
id pubmed-5001205
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50012052016-09-06 Detection of high variability in gene expression from single-cell RNA-seq profiling Chen, Hung-I Harry Jin, Yufang Huang, Yufei Chen, Yidong BMC Genomics Research BACKGROUND: The advancement of the next-generation sequencing technology enables mapping gene expression at the single-cell level, capable of tracking cell heterogeneity and determination of cell subpopulations using single-cell RNA sequencing (scRNA-seq). Unlike the objectives of conventional RNA-seq where differential expression analysis is the integral component, the most important goal of scRNA-seq is to identify highly variable genes across a population of cells, to account for the discrete nature of single-cell gene expression and uniqueness of sequencing library preparation protocol for single-cell sequencing. However, there is lack of generic expression variation model for different scRNA-seq data sets. Hence, the objective of this study is to develop a gene expression variation model (GEVM), utilizing the relationship between coefficient of variation (CV) and average expression level to address the over-dispersion of single-cell data, and its corresponding statistical significance to quantify the variably expressed genes (VEGs). RESULTS: We have built a simulation framework that generated scRNA-seq data with different number of cells, model parameters, and variation levels. We implemented our GEVM and demonstrated the robustness by using a set of simulated scRNA-seq data under different conditions. We evaluated the regression robustness using root-mean-square error (RMSE) and assessed the parameter estimation process by varying initial model parameters that deviated from homogeneous cell population. We also applied the GEVM on real scRNA-seq data to test the performance under distinct cases. CONCLUSIONS: In this paper, we proposed a gene expression variation model that can be used to determine significant variably expressed genes. Applying the model to the simulated single-cell data, we observed robust parameter estimation under different conditions with minimal root mean square errors. We also examined the model on two distinct scRNA-seq data sets using different single-cell protocols and determined the VEGs. Obtaining VEGs allowed us to observe possible subpopulations, providing further evidences of cell heterogeneity. With the GEVM, we can easily find out significant variably expressed genes in different scRNA-seq data sets. BioMed Central 2016-08-22 /pmc/articles/PMC5001205/ /pubmed/27556924 http://dx.doi.org/10.1186/s12864-016-2897-6 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Chen, Hung-I Harry
Jin, Yufang
Huang, Yufei
Chen, Yidong
Detection of high variability in gene expression from single-cell RNA-seq profiling
title Detection of high variability in gene expression from single-cell RNA-seq profiling
title_full Detection of high variability in gene expression from single-cell RNA-seq profiling
title_fullStr Detection of high variability in gene expression from single-cell RNA-seq profiling
title_full_unstemmed Detection of high variability in gene expression from single-cell RNA-seq profiling
title_short Detection of high variability in gene expression from single-cell RNA-seq profiling
title_sort detection of high variability in gene expression from single-cell rna-seq profiling
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001205/
https://www.ncbi.nlm.nih.gov/pubmed/27556924
http://dx.doi.org/10.1186/s12864-016-2897-6
work_keys_str_mv AT chenhungiharry detectionofhighvariabilityingeneexpressionfromsinglecellrnaseqprofiling
AT jinyufang detectionofhighvariabilityingeneexpressionfromsinglecellrnaseqprofiling
AT huangyufei detectionofhighvariabilityingeneexpressionfromsinglecellrnaseqprofiling
AT chenyidong detectionofhighvariabilityingeneexpressionfromsinglecellrnaseqprofiling