Cargando…
Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a maj...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8481385/ https://www.ncbi.nlm.nih.gov/pubmed/34604741 http://dx.doi.org/10.3389/fdata.2021.725276 |
_version_ | 1784576674631778304 |
---|---|
author | Nguyen, Thanh M. Bharti, Samuel Yue, Zongliang Willey, Christopher D. Chen, Jake Y. |
author_facet | Nguyen, Thanh M. Bharti, Samuel Yue, Zongliang Willey, Christopher D. Chen, Jake Y. |
author_sort | Nguyen, Thanh M. |
collection | PubMed |
description | Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a major challenge. Here, we describe a powerful analytical method called Statistical Enrichment Analysis of Samples (SEAS) for interpreting clustered or embedded sample data from omics studies. The method derives its power by focusing on sample sets, i.e., groups of biological samples that were constructed for various purposes, e.g., manual curation of samples sharing specific characteristics or automated clusters generated by embedding sample omic profiles from multi-dimensional omics space. The samples in the sample set share common clinical measurements, which we refer to as “clinotypes,” such as age group, gender, treatment status, or survival days. We demonstrate how SEAS yields insights into biological data sets using glioblastoma (GBM) samples. Notably, when analyzing the combined The Cancer Genome Atlas (TCGA)—patient-derived xenograft (PDX) data, SEAS allows approximating the different clinical outcomes of radiotherapy-treated PDX samples, which has not been solved by other tools. The result shows that SEAS may support the clinical decision. The SEAS tool is publicly available as a freely available software package at https://aimed-lab.shinyapps.io/SEAS/. |
format | Online Article Text |
id | pubmed-8481385 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-84813852021-10-01 Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples Nguyen, Thanh M. Bharti, Samuel Yue, Zongliang Willey, Christopher D. Chen, Jake Y. Front Big Data Big Data Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a major challenge. Here, we describe a powerful analytical method called Statistical Enrichment Analysis of Samples (SEAS) for interpreting clustered or embedded sample data from omics studies. The method derives its power by focusing on sample sets, i.e., groups of biological samples that were constructed for various purposes, e.g., manual curation of samples sharing specific characteristics or automated clusters generated by embedding sample omic profiles from multi-dimensional omics space. The samples in the sample set share common clinical measurements, which we refer to as “clinotypes,” such as age group, gender, treatment status, or survival days. We demonstrate how SEAS yields insights into biological data sets using glioblastoma (GBM) samples. Notably, when analyzing the combined The Cancer Genome Atlas (TCGA)—patient-derived xenograft (PDX) data, SEAS allows approximating the different clinical outcomes of radiotherapy-treated PDX samples, which has not been solved by other tools. The result shows that SEAS may support the clinical decision. The SEAS tool is publicly available as a freely available software package at https://aimed-lab.shinyapps.io/SEAS/. Frontiers Media S.A. 2021-09-16 /pmc/articles/PMC8481385/ /pubmed/34604741 http://dx.doi.org/10.3389/fdata.2021.725276 Text en Copyright © 2021 Nguyen, Bharti, Yue, Willey and Chen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Big Data Nguyen, Thanh M. Bharti, Samuel Yue, Zongliang Willey, Christopher D. Chen, Jake Y. Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title | Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_full | Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_fullStr | Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_full_unstemmed | Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_short | Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples |
title_sort | statistical enrichment analysis of samples: a general-purpose tool to annotate metadata neighborhoods of biological samples |
topic | Big Data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8481385/ https://www.ncbi.nlm.nih.gov/pubmed/34604741 http://dx.doi.org/10.3389/fdata.2021.725276 |
work_keys_str_mv | AT nguyenthanhm statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples AT bhartisamuel statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples AT yuezongliang statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples AT willeychristopherd statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples AT chenjakey statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples |