Cargando…

clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets

Clustering of genes and/or samples is a common task in gene expression analysis. The goals in clustering can vary, but an important scenario is that of finding biologically meaningful subtypes within the samples. This is an application that is particularly appropriate when there are large numbers of...

Descripción completa

Detalles Bibliográficos
Autores principales: Risso, Davide, Purvis, Liam, Fletcher, Russell B., Das, Diya, Ngai, John, Dudoit, Sandrine, Purdom, Elizabeth
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138422/
https://www.ncbi.nlm.nih.gov/pubmed/30180157
http://dx.doi.org/10.1371/journal.pcbi.1006378
_version_ 1783355344519430144
author Risso, Davide
Purvis, Liam
Fletcher, Russell B.
Das, Diya
Ngai, John
Dudoit, Sandrine
Purdom, Elizabeth
author_facet Risso, Davide
Purvis, Liam
Fletcher, Russell B.
Das, Diya
Ngai, John
Dudoit, Sandrine
Purdom, Elizabeth
author_sort Risso, Davide
collection PubMed
description Clustering of genes and/or samples is a common task in gene expression analysis. The goals in clustering can vary, but an important scenario is that of finding biologically meaningful subtypes within the samples. This is an application that is particularly appropriate when there are large numbers of samples, as in many human disease studies. With the increasing popularity of single-cell transcriptome sequencing (RNA-Seq), many more controlled experiments on model organisms are similarly creating large gene expression datasets with the goal of detecting previously unknown heterogeneity within cells. It is common in the detection of novel subtypes to run many clustering algorithms, as well as rely on subsampling and ensemble methods to improve robustness. We introduce a Bioconductor R package, clusterExperiment, that implements a general and flexible strategy we entitle Resampling-based Sequential Ensemble Clustering (RSEC). RSEC enables the user to easily create multiple, competing clusterings of the data based on different techniques and associated tuning parameters, including easy integration of resampling and sequential clustering, and then provides methods for consolidating the multiple clusterings into a final consensus clustering. The package is modular and allows the user to separately apply the individual components of the RSEC procedure, i.e., apply multiple clustering algorithms, create a consensus clustering or choose tuning parameters, and merge clusters. Additionally, clusterExperiment provides a variety of visualization tools for the clustering process, as well as methods for the identification of possible cluster signatures or biomarkers. The R package clusterExperiment is publicly available through the Bioconductor Project, with a detailed manual (vignette) as well as well documented help pages for each function.
format Online
Article
Text
id pubmed-6138422
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61384222018-09-27 clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets Risso, Davide Purvis, Liam Fletcher, Russell B. Das, Diya Ngai, John Dudoit, Sandrine Purdom, Elizabeth PLoS Comput Biol Research Article Clustering of genes and/or samples is a common task in gene expression analysis. The goals in clustering can vary, but an important scenario is that of finding biologically meaningful subtypes within the samples. This is an application that is particularly appropriate when there are large numbers of samples, as in many human disease studies. With the increasing popularity of single-cell transcriptome sequencing (RNA-Seq), many more controlled experiments on model organisms are similarly creating large gene expression datasets with the goal of detecting previously unknown heterogeneity within cells. It is common in the detection of novel subtypes to run many clustering algorithms, as well as rely on subsampling and ensemble methods to improve robustness. We introduce a Bioconductor R package, clusterExperiment, that implements a general and flexible strategy we entitle Resampling-based Sequential Ensemble Clustering (RSEC). RSEC enables the user to easily create multiple, competing clusterings of the data based on different techniques and associated tuning parameters, including easy integration of resampling and sequential clustering, and then provides methods for consolidating the multiple clusterings into a final consensus clustering. The package is modular and allows the user to separately apply the individual components of the RSEC procedure, i.e., apply multiple clustering algorithms, create a consensus clustering or choose tuning parameters, and merge clusters. Additionally, clusterExperiment provides a variety of visualization tools for the clustering process, as well as methods for the identification of possible cluster signatures or biomarkers. The R package clusterExperiment is publicly available through the Bioconductor Project, with a detailed manual (vignette) as well as well documented help pages for each function. Public Library of Science 2018-09-04 /pmc/articles/PMC6138422/ /pubmed/30180157 http://dx.doi.org/10.1371/journal.pcbi.1006378 Text en © 2018 Risso et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Risso, Davide
Purvis, Liam
Fletcher, Russell B.
Das, Diya
Ngai, John
Dudoit, Sandrine
Purdom, Elizabeth
clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets
title clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets
title_full clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets
title_fullStr clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets
title_full_unstemmed clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets
title_short clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets
title_sort clusterexperiment and rsec: a bioconductor package and framework for clustering of single-cell and other large gene expression datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6138422/
https://www.ncbi.nlm.nih.gov/pubmed/30180157
http://dx.doi.org/10.1371/journal.pcbi.1006378
work_keys_str_mv AT rissodavide clusterexperimentandrsecabioconductorpackageandframeworkforclusteringofsinglecellandotherlargegeneexpressiondatasets
AT purvisliam clusterexperimentandrsecabioconductorpackageandframeworkforclusteringofsinglecellandotherlargegeneexpressiondatasets
AT fletcherrussellb clusterexperimentandrsecabioconductorpackageandframeworkforclusteringofsinglecellandotherlargegeneexpressiondatasets
AT dasdiya clusterexperimentandrsecabioconductorpackageandframeworkforclusteringofsinglecellandotherlargegeneexpressiondatasets
AT ngaijohn clusterexperimentandrsecabioconductorpackageandframeworkforclusteringofsinglecellandotherlargegeneexpressiondatasets
AT dudoitsandrine clusterexperimentandrsecabioconductorpackageandframeworkforclusteringofsinglecellandotherlargegeneexpressiondatasets
AT purdomelizabeth clusterexperimentandrsecabioconductorpackageandframeworkforclusteringofsinglecellandotherlargegeneexpressiondatasets