Cargando…
Random forest based similarity learning for single cell RNA sequencing data
MOTIVATION: Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022547/ https://www.ncbi.nlm.nih.gov/pubmed/29950006 http://dx.doi.org/10.1093/bioinformatics/bty260 |
_version_ | 1783335701513764864 |
---|---|
author | Pouyan, Maziyar Baran Kostka, Dennis |
author_facet | Pouyan, Maziyar Baran Kostka, Dennis |
author_sort | Pouyan, Maziyar Baran |
collection | PubMed |
description | MOTIVATION: Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell–cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal. RESULTS: Here, we present RAFSIL, a random forest based approach to learn cell–cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The RAFSIL R package is available at www.kostkalab.net/software.html SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6022547 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60225472018-07-10 Random forest based similarity learning for single cell RNA sequencing data Pouyan, Maziyar Baran Kostka, Dennis Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell–cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal. RESULTS: Here, we present RAFSIL, a random forest based approach to learn cell–cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The RAFSIL R package is available at www.kostkalab.net/software.html SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022547/ /pubmed/29950006 http://dx.doi.org/10.1093/bioinformatics/bty260 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings Pouyan, Maziyar Baran Kostka, Dennis Random forest based similarity learning for single cell RNA sequencing data |
title | Random forest based similarity learning for single cell RNA sequencing data |
title_full | Random forest based similarity learning for single cell RNA sequencing data |
title_fullStr | Random forest based similarity learning for single cell RNA sequencing data |
title_full_unstemmed | Random forest based similarity learning for single cell RNA sequencing data |
title_short | Random forest based similarity learning for single cell RNA sequencing data |
title_sort | random forest based similarity learning for single cell rna sequencing data |
topic | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022547/ https://www.ncbi.nlm.nih.gov/pubmed/29950006 http://dx.doi.org/10.1093/bioinformatics/bty260 |
work_keys_str_mv | AT pouyanmaziyarbaran randomforestbasedsimilaritylearningforsinglecellrnasequencingdata AT kostkadennis randomforestbasedsimilaritylearningforsinglecellrnasequencingdata |