Cargando…

Random forest based similarity learning for single cell RNA sequencing data

MOTIVATION: Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore...

Descripción completa

Detalles Bibliográficos
Autores principales: Pouyan, Maziyar Baran, Kostka, Dennis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022547/
https://www.ncbi.nlm.nih.gov/pubmed/29950006
http://dx.doi.org/10.1093/bioinformatics/bty260
_version_ 1783335701513764864
author Pouyan, Maziyar Baran
Kostka, Dennis
author_facet Pouyan, Maziyar Baran
Kostka, Dennis
author_sort Pouyan, Maziyar Baran
collection PubMed
description MOTIVATION: Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell–cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal. RESULTS: Here, we present RAFSIL, a random forest based approach to learn cell–cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The RAFSIL R package is available at www.kostkalab.net/software.html SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6022547
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60225472018-07-10 Random forest based similarity learning for single cell RNA sequencing data Pouyan, Maziyar Baran Kostka, Dennis Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Genome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore, obtaining accurate cell–cell similarities from scRNA-seq data is a critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal. RESULTS: Here, we present RAFSIL, a random forest based approach to learn cell–cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The RAFSIL R package is available at www.kostkalab.net/software.html SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022547/ /pubmed/29950006 http://dx.doi.org/10.1093/bioinformatics/bty260 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Pouyan, Maziyar Baran
Kostka, Dennis
Random forest based similarity learning for single cell RNA sequencing data
title Random forest based similarity learning for single cell RNA sequencing data
title_full Random forest based similarity learning for single cell RNA sequencing data
title_fullStr Random forest based similarity learning for single cell RNA sequencing data
title_full_unstemmed Random forest based similarity learning for single cell RNA sequencing data
title_short Random forest based similarity learning for single cell RNA sequencing data
title_sort random forest based similarity learning for single cell rna sequencing data
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022547/
https://www.ncbi.nlm.nih.gov/pubmed/29950006
http://dx.doi.org/10.1093/bioinformatics/bty260
work_keys_str_mv AT pouyanmaziyarbaran randomforestbasedsimilaritylearningforsinglecellrnasequencingdata
AT kostkadennis randomforestbasedsimilaritylearningforsinglecellrnasequencingdata