Cargando…
Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data
As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven di...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8464061/ https://www.ncbi.nlm.nih.gov/pubmed/34226941 http://dx.doi.org/10.1093/nar/gkab552 |
_version_ | 1784572538275233792 |
---|---|
author | Reed, Eric R Monti, Stefano |
author_facet | Reed, Eric R Monti, Stefano |
author_sort | Reed, Eric R |
collection | PubMed |
description | As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we present K2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a ‘taxonomy-like’ structure. K2Taxonomer was devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics, and other ‘-omics’, data. For each of these data types, we demonstrate the power of K2Taxonomer to discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data. |
format | Online Article Text |
id | pubmed-8464061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-84640612021-09-27 Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data Reed, Eric R Monti, Stefano Nucleic Acids Res Methods Online As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we present K2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a ‘taxonomy-like’ structure. K2Taxonomer was devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics, and other ‘-omics’, data. For each of these data types, we demonstrate the power of K2Taxonomer to discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data. Oxford University Press 2021-07-06 /pmc/articles/PMC8464061/ /pubmed/34226941 http://dx.doi.org/10.1093/nar/gkab552 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Reed, Eric R Monti, Stefano Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data |
title | Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data |
title_full | Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data |
title_fullStr | Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data |
title_full_unstemmed | Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data |
title_short | Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data |
title_sort | multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8464061/ https://www.ncbi.nlm.nih.gov/pubmed/34226941 http://dx.doi.org/10.1093/nar/gkab552 |
work_keys_str_mv | AT reedericr multiresolutioncharacterizationofmoleculartaxonomiesinbulkandsinglecelltranscriptomicsdata AT montistefano multiresolutioncharacterizationofmoleculartaxonomiesinbulkandsinglecelltranscriptomicsdata |