Cargando…

An integrated single-cell transcriptomic dataset for non-small cell lung cancer

As single-cell RNA sequencing (scRNA-seq) has emerged as a great tool for studying cellular heterogeneity within the past decade, the number of available scRNA-seq datasets also rapidly increased. However, reuse of such data is often problematic due to a small cohort size, limited cell types, and in...

Descripción completa

Detalles Bibliográficos
Autores principales: Prazanowska, Karolina Hanna, Lim, Su Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10042991/
https://www.ncbi.nlm.nih.gov/pubmed/36973297
http://dx.doi.org/10.1038/s41597-023-02074-6
_version_ 1784913053330964480
author Prazanowska, Karolina Hanna
Lim, Su Bin
author_facet Prazanowska, Karolina Hanna
Lim, Su Bin
author_sort Prazanowska, Karolina Hanna
collection PubMed
description As single-cell RNA sequencing (scRNA-seq) has emerged as a great tool for studying cellular heterogeneity within the past decade, the number of available scRNA-seq datasets also rapidly increased. However, reuse of such data is often problematic due to a small cohort size, limited cell types, and insufficient information on cell type classification. Here, we present a large integrated scRNA-seq dataset containing 224,611 cells from human primary non-small cell lung cancer (NSCLC) tumors. Using publicly available resources, we pre-processed and integrated seven independent scRNA-seq datasets using an anchor-based approach, with five datasets utilized as reference and the remaining two, as validation. We created two levels of annotation based on cell type-specific markers conserved across the datasets. To demonstrate usability of the integrated dataset, we created annotation predictions for the two validation datasets using our integrated reference. Additionally, we conducted a trajectory analysis on subsets of T cells and lung cancer cells. This integrated data may serve as a resource for studying NSCLC transcriptome at the single cell level.
format Online
Article
Text
id pubmed-10042991
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-100429912023-03-29 An integrated single-cell transcriptomic dataset for non-small cell lung cancer Prazanowska, Karolina Hanna Lim, Su Bin Sci Data Analysis As single-cell RNA sequencing (scRNA-seq) has emerged as a great tool for studying cellular heterogeneity within the past decade, the number of available scRNA-seq datasets also rapidly increased. However, reuse of such data is often problematic due to a small cohort size, limited cell types, and insufficient information on cell type classification. Here, we present a large integrated scRNA-seq dataset containing 224,611 cells from human primary non-small cell lung cancer (NSCLC) tumors. Using publicly available resources, we pre-processed and integrated seven independent scRNA-seq datasets using an anchor-based approach, with five datasets utilized as reference and the remaining two, as validation. We created two levels of annotation based on cell type-specific markers conserved across the datasets. To demonstrate usability of the integrated dataset, we created annotation predictions for the two validation datasets using our integrated reference. Additionally, we conducted a trajectory analysis on subsets of T cells and lung cancer cells. This integrated data may serve as a resource for studying NSCLC transcriptome at the single cell level. Nature Publishing Group UK 2023-03-27 /pmc/articles/PMC10042991/ /pubmed/36973297 http://dx.doi.org/10.1038/s41597-023-02074-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Analysis
Prazanowska, Karolina Hanna
Lim, Su Bin
An integrated single-cell transcriptomic dataset for non-small cell lung cancer
title An integrated single-cell transcriptomic dataset for non-small cell lung cancer
title_full An integrated single-cell transcriptomic dataset for non-small cell lung cancer
title_fullStr An integrated single-cell transcriptomic dataset for non-small cell lung cancer
title_full_unstemmed An integrated single-cell transcriptomic dataset for non-small cell lung cancer
title_short An integrated single-cell transcriptomic dataset for non-small cell lung cancer
title_sort integrated single-cell transcriptomic dataset for non-small cell lung cancer
topic Analysis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10042991/
https://www.ncbi.nlm.nih.gov/pubmed/36973297
http://dx.doi.org/10.1038/s41597-023-02074-6
work_keys_str_mv AT prazanowskakarolinahanna anintegratedsinglecelltranscriptomicdatasetfornonsmallcelllungcancer
AT limsubin anintegratedsinglecelltranscriptomicdatasetfornonsmallcelllungcancer
AT prazanowskakarolinahanna integratedsinglecelltranscriptomicdatasetfornonsmallcelllungcancer
AT limsubin integratedsinglecelltranscriptomicdatasetfornonsmallcelllungcancer