Cargando…

ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data

BACKGROUND: In recent years, the introduction of single-cell RNA sequencing (scRNA-seq) has enabled the analysis of a cell’s transcriptome at an unprecedented granularity and processing speed. The experimental outcome of applying this technology is a [Formula: see text] matrix containing aggregated...

Descripción completa

Detalles Bibliográficos
Autores principales:	Malec, Marcin, Kurban, Hasan, Dalkilic, Mehmet
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9306045/ https://www.ncbi.nlm.nih.gov/pubmed/35869420 http://dx.doi.org/10.1186/s12859-022-04814-8

_version_	1784752460530712576
author	Malec, Marcin Kurban, Hasan Dalkilic, Mehmet
author_facet	Malec, Marcin Kurban, Hasan Dalkilic, Mehmet
author_sort	Malec, Marcin
collection	PubMed
description	BACKGROUND: In recent years, the introduction of single-cell RNA sequencing (scRNA-seq) has enabled the analysis of a cell’s transcriptome at an unprecedented granularity and processing speed. The experimental outcome of applying this technology is a [Formula: see text] matrix containing aggregated mRNA expression counts of M genes and N cell samples. From this matrix, scientists can study how cell protein synthesis changes in response to various factors, for example, disease versus non-disease states in response to a treatment protocol. This technology’s critical challenge is detecting and accurately recording lowly expressed genes. As a result, low expression levels tend to be missed and recorded as zero - an event known as dropout. This makes the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult. RESULTS: To address this problem, we propose an approach to measure cell similarity using consensus clustering and demonstrate an effective and efficient algorithm that takes advantage of this new similarity measure to impute the most probable dropout events in the scRNA-seq datasets. We demonstrate that our approach exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities. CONCLUSIONS: ccImpute is an effective algorithm to correct for dropout events and thus improve downstream analysis of scRNA-seq data. ccImpute is implemented in R and is available at https://github.com/khazum/ccImpute.
format	Online Article Text
id	pubmed-9306045
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-93060452022-07-23 ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data Malec, Marcin Kurban, Hasan Dalkilic, Mehmet BMC Bioinformatics Research BACKGROUND: In recent years, the introduction of single-cell RNA sequencing (scRNA-seq) has enabled the analysis of a cell’s transcriptome at an unprecedented granularity and processing speed. The experimental outcome of applying this technology is a [Formula: see text] matrix containing aggregated mRNA expression counts of M genes and N cell samples. From this matrix, scientists can study how cell protein synthesis changes in response to various factors, for example, disease versus non-disease states in response to a treatment protocol. This technology’s critical challenge is detecting and accurately recording lowly expressed genes. As a result, low expression levels tend to be missed and recorded as zero - an event known as dropout. This makes the lowly expressed genes indistinguishable from true zero expression and different than the low expression present in cells of the same type. This issue makes any subsequent downstream analysis difficult. RESULTS: To address this problem, we propose an approach to measure cell similarity using consensus clustering and demonstrate an effective and efficient algorithm that takes advantage of this new similarity measure to impute the most probable dropout events in the scRNA-seq datasets. We demonstrate that our approach exceeds the performance of existing imputation approaches while introducing the least amount of new noise as measured by clustering performance characteristics on datasets with known cell identities. CONCLUSIONS: ccImpute is an effective algorithm to correct for dropout events and thus improve downstream analysis of scRNA-seq data. ccImpute is implemented in R and is available at https://github.com/khazum/ccImpute. BioMed Central 2022-07-22 /pmc/articles/PMC9306045/ /pubmed/35869420 http://dx.doi.org/10.1186/s12859-022-04814-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Malec, Marcin Kurban, Hasan Dalkilic, Mehmet ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data
title	ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data
title_full	ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data
title_fullStr	ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data
title_full_unstemmed	ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data
title_short	ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data
title_sort	ccimpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell rna-seq data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9306045/ https://www.ncbi.nlm.nih.gov/pubmed/35869420 http://dx.doi.org/10.1186/s12859-022-04814-8
work_keys_str_mv	AT malecmarcin ccimputeanaccurateandscalableconsensusclusteringbasedalgorithmtoimputedropouteventsinthesinglecellrnaseqdata AT kurbanhasan ccimputeanaccurateandscalableconsensusclusteringbasedalgorithmtoimputedropouteventsinthesinglecellrnaseqdata AT dalkilicmehmet ccimputeanaccurateandscalableconsensusclusteringbasedalgorithmtoimputedropouteventsinthesinglecellrnaseqdata

ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data

Ejemplares similares