Cargando…

Genomics data analysis via spectral shape and topology

Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimens...

Descripción completa

Detalles Bibliográficos
Autores principales: Amézquita, Erik J., Nasrin, Farzana, Storey, Kathleen M., Yoshizawa, Masato
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10132553/
https://www.ncbi.nlm.nih.gov/pubmed/37099525
http://dx.doi.org/10.1371/journal.pone.0284820
_version_ 1785031406084161536
author Amézquita, Erik J.
Nasrin, Farzana
Storey, Kathleen M.
Yoshizawa, Masato
author_facet Amézquita, Erik J.
Nasrin, Farzana
Storey, Kathleen M.
Yoshizawa, Masato
author_sort Amézquita, Erik J.
collection PubMed
description Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper, differential gene expression, and spectral shape analysis. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-distributed stochastic neighbor embedding (t-SNE). Although Mapper shows promise in analyzing high-dimensional data, tools to statistically analyze Mapper graphical structures are limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.
format Online
Article
Text
id pubmed-10132553
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-101325532023-04-27 Genomics data analysis via spectral shape and topology Amézquita, Erik J. Nasrin, Farzana Storey, Kathleen M. Yoshizawa, Masato PLoS One Research Article Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper, differential gene expression, and spectral shape analysis. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-distributed stochastic neighbor embedding (t-SNE). Although Mapper shows promise in analyzing high-dimensional data, tools to statistically analyze Mapper graphical structures are limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis. Public Library of Science 2023-04-26 /pmc/articles/PMC10132553/ /pubmed/37099525 http://dx.doi.org/10.1371/journal.pone.0284820 Text en © 2023 Amézquita et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Amézquita, Erik J.
Nasrin, Farzana
Storey, Kathleen M.
Yoshizawa, Masato
Genomics data analysis via spectral shape and topology
title Genomics data analysis via spectral shape and topology
title_full Genomics data analysis via spectral shape and topology
title_fullStr Genomics data analysis via spectral shape and topology
title_full_unstemmed Genomics data analysis via spectral shape and topology
title_short Genomics data analysis via spectral shape and topology
title_sort genomics data analysis via spectral shape and topology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10132553/
https://www.ncbi.nlm.nih.gov/pubmed/37099525
http://dx.doi.org/10.1371/journal.pone.0284820
work_keys_str_mv AT amezquitaerikj genomicsdataanalysisviaspectralshapeandtopology
AT nasrinfarzana genomicsdataanalysisviaspectralshapeandtopology
AT storeykathleenm genomicsdataanalysisviaspectralshapeandtopology
AT yoshizawamasato genomicsdataanalysisviaspectralshapeandtopology