Cargando…

Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data

As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven di...

Descripción completa

Detalles Bibliográficos
Autores principales: Reed, Eric R, Monti, Stefano
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8464061/
https://www.ncbi.nlm.nih.gov/pubmed/34226941
http://dx.doi.org/10.1093/nar/gkab552
_version_ 1784572538275233792
author Reed, Eric R
Monti, Stefano
author_facet Reed, Eric R
Monti, Stefano
author_sort Reed, Eric R
collection PubMed
description As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we present K2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a ‘taxonomy-like’ structure. K2Taxonomer was devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics, and other ‘-omics’, data. For each of these data types, we demonstrate the power of K2Taxonomer to discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data.
format Online
Article
Text
id pubmed-8464061
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84640612021-09-27 Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data Reed, Eric R Monti, Stefano Nucleic Acids Res Methods Online As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we present K2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a ‘taxonomy-like’ structure. K2Taxonomer was devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics, and other ‘-omics’, data. For each of these data types, we demonstrate the power of K2Taxonomer to discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data. Oxford University Press 2021-07-06 /pmc/articles/PMC8464061/ /pubmed/34226941 http://dx.doi.org/10.1093/nar/gkab552 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Reed, Eric R
Monti, Stefano
Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data
title Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data
title_full Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data
title_fullStr Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data
title_full_unstemmed Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data
title_short Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data
title_sort multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8464061/
https://www.ncbi.nlm.nih.gov/pubmed/34226941
http://dx.doi.org/10.1093/nar/gkab552
work_keys_str_mv AT reedericr multiresolutioncharacterizationofmoleculartaxonomiesinbulkandsinglecelltranscriptomicsdata
AT montistefano multiresolutioncharacterizationofmoleculartaxonomiesinbulkandsinglecelltranscriptomicsdata