Cargando…

Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples

Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a maj...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Thanh M., Bharti, Samuel, Yue, Zongliang, Willey, Christopher D., Chen, Jake Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8481385/
https://www.ncbi.nlm.nih.gov/pubmed/34604741
http://dx.doi.org/10.3389/fdata.2021.725276
_version_ 1784576674631778304
author Nguyen, Thanh M.
Bharti, Samuel
Yue, Zongliang
Willey, Christopher D.
Chen, Jake Y.
author_facet Nguyen, Thanh M.
Bharti, Samuel
Yue, Zongliang
Willey, Christopher D.
Chen, Jake Y.
author_sort Nguyen, Thanh M.
collection PubMed
description Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a major challenge. Here, we describe a powerful analytical method called Statistical Enrichment Analysis of Samples (SEAS) for interpreting clustered or embedded sample data from omics studies. The method derives its power by focusing on sample sets, i.e., groups of biological samples that were constructed for various purposes, e.g., manual curation of samples sharing specific characteristics or automated clusters generated by embedding sample omic profiles from multi-dimensional omics space. The samples in the sample set share common clinical measurements, which we refer to as “clinotypes,” such as age group, gender, treatment status, or survival days. We demonstrate how SEAS yields insights into biological data sets using glioblastoma (GBM) samples. Notably, when analyzing the combined The Cancer Genome Atlas (TCGA)—patient-derived xenograft (PDX) data, SEAS allows approximating the different clinical outcomes of radiotherapy-treated PDX samples, which has not been solved by other tools. The result shows that SEAS may support the clinical decision. The SEAS tool is publicly available as a freely available software package at https://aimed-lab.shinyapps.io/SEAS/.
format Online
Article
Text
id pubmed-8481385
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-84813852021-10-01 Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples Nguyen, Thanh M. Bharti, Samuel Yue, Zongliang Willey, Christopher D. Chen, Jake Y. Front Big Data Big Data Unsupervised learning techniques, such as clustering and embedding, have been increasingly popular to cluster biomedical samples from high-dimensional biomedical data. Extracting clinical data or sample meta-data shared in common among biomedical samples of a given biological condition remains a major challenge. Here, we describe a powerful analytical method called Statistical Enrichment Analysis of Samples (SEAS) for interpreting clustered or embedded sample data from omics studies. The method derives its power by focusing on sample sets, i.e., groups of biological samples that were constructed for various purposes, e.g., manual curation of samples sharing specific characteristics or automated clusters generated by embedding sample omic profiles from multi-dimensional omics space. The samples in the sample set share common clinical measurements, which we refer to as “clinotypes,” such as age group, gender, treatment status, or survival days. We demonstrate how SEAS yields insights into biological data sets using glioblastoma (GBM) samples. Notably, when analyzing the combined The Cancer Genome Atlas (TCGA)—patient-derived xenograft (PDX) data, SEAS allows approximating the different clinical outcomes of radiotherapy-treated PDX samples, which has not been solved by other tools. The result shows that SEAS may support the clinical decision. The SEAS tool is publicly available as a freely available software package at https://aimed-lab.shinyapps.io/SEAS/. Frontiers Media S.A. 2021-09-16 /pmc/articles/PMC8481385/ /pubmed/34604741 http://dx.doi.org/10.3389/fdata.2021.725276 Text en Copyright © 2021 Nguyen, Bharti, Yue, Willey and Chen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Nguyen, Thanh M.
Bharti, Samuel
Yue, Zongliang
Willey, Christopher D.
Chen, Jake Y.
Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_full Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_fullStr Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_full_unstemmed Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_short Statistical Enrichment Analysis of Samples: A General-Purpose Tool to Annotate Metadata Neighborhoods of Biological Samples
title_sort statistical enrichment analysis of samples: a general-purpose tool to annotate metadata neighborhoods of biological samples
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8481385/
https://www.ncbi.nlm.nih.gov/pubmed/34604741
http://dx.doi.org/10.3389/fdata.2021.725276
work_keys_str_mv AT nguyenthanhm statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples
AT bhartisamuel statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples
AT yuezongliang statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples
AT willeychristopherd statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples
AT chenjakey statisticalenrichmentanalysisofsamplesageneralpurposetooltoannotatemetadataneighborhoodsofbiologicalsamples