Cargando…
A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining
Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that we...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696874/ https://www.ncbi.nlm.nih.gov/pubmed/31447887 http://dx.doi.org/10.3389/fgene.2019.00734 |
_version_ | 1783444343823007744 |
---|---|
author | Gambardella, Gennaro di Bernardo, Diego |
author_facet | Gambardella, Gennaro di Bernardo, Diego |
author_sort | Gambardella, Gennaro |
collection | PubMed |
description | Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types. |
format | Online Article Text |
id | pubmed-6696874 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-66968742019-08-23 A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining Gambardella, Gennaro di Bernardo, Diego Front Genet Genetics Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types. Frontiers Media S.A. 2019-08-09 /pmc/articles/PMC6696874/ /pubmed/31447887 http://dx.doi.org/10.3389/fgene.2019.00734 Text en Copyright © 2019 Gambardella and di Bernardo http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Gambardella, Gennaro di Bernardo, Diego A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining |
title | A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining |
title_full | A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining |
title_fullStr | A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining |
title_full_unstemmed | A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining |
title_short | A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining |
title_sort | tool for visualization and analysis of single-cell rna-seq data based on text mining |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696874/ https://www.ncbi.nlm.nih.gov/pubmed/31447887 http://dx.doi.org/10.3389/fgene.2019.00734 |
work_keys_str_mv | AT gambardellagennaro atoolforvisualizationandanalysisofsinglecellrnaseqdatabasedontextmining AT dibernardodiego atoolforvisualizationandanalysisofsinglecellrnaseqdatabasedontextmining AT gambardellagennaro toolforvisualizationandanalysisofsinglecellrnaseqdatabasedontextmining AT dibernardodiego toolforvisualizationandanalysisofsinglecellrnaseqdatabasedontextmining |