Cargando…

A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining

Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that we...

Descripción completa

Detalles Bibliográficos
Autores principales: Gambardella, Gennaro, di Bernardo, Diego
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696874/
https://www.ncbi.nlm.nih.gov/pubmed/31447887
http://dx.doi.org/10.3389/fgene.2019.00734
_version_ 1783444343823007744
author Gambardella, Gennaro
di Bernardo, Diego
author_facet Gambardella, Gennaro
di Bernardo, Diego
author_sort Gambardella, Gennaro
collection PubMed
description Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types.
format Online
Article
Text
id pubmed-6696874
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-66968742019-08-23 A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining Gambardella, Gennaro di Bernardo, Diego Front Genet Genetics Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types. Frontiers Media S.A. 2019-08-09 /pmc/articles/PMC6696874/ /pubmed/31447887 http://dx.doi.org/10.3389/fgene.2019.00734 Text en Copyright © 2019 Gambardella and di Bernardo http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Gambardella, Gennaro
di Bernardo, Diego
A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining
title A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining
title_full A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining
title_fullStr A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining
title_full_unstemmed A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining
title_short A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining
title_sort tool for visualization and analysis of single-cell rna-seq data based on text mining
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696874/
https://www.ncbi.nlm.nih.gov/pubmed/31447887
http://dx.doi.org/10.3389/fgene.2019.00734
work_keys_str_mv AT gambardellagennaro atoolforvisualizationandanalysisofsinglecellrnaseqdatabasedontextmining
AT dibernardodiego atoolforvisualizationandanalysisofsinglecellrnaseqdatabasedontextmining
AT gambardellagennaro toolforvisualizationandanalysisofsinglecellrnaseqdatabasedontextmining
AT dibernardodiego toolforvisualizationandanalysisofsinglecellrnaseqdatabasedontextmining