Cargando…

A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data

The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phe...

Descripción completa

Detalles Bibliográficos
Autores principales: Mandal, Sayan, Guzmán-Sáenz, Aldo, Haiminen, Niina, Basu, Saugata, Parida, Laxmi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197058/
http://dx.doi.org/10.1007/978-3-030-42266-0_14
_version_ 1783528808686551040
author Mandal, Sayan
Guzmán-Sáenz, Aldo
Haiminen, Niina
Basu, Saugata
Parida, Laxmi
author_facet Mandal, Sayan
Guzmán-Sáenz, Aldo
Haiminen, Niina
Basu, Saugata
Parida, Laxmi
author_sort Mandal, Sayan
collection PubMed
description The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis. We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods. This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.
format Online
Article
Text
id pubmed-7197058
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71970582020-05-04 A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data Mandal, Sayan Guzmán-Sáenz, Aldo Haiminen, Niina Basu, Saugata Parida, Laxmi Algorithms for Computational Biology Article The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis. We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods. This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes. 2020-02-01 /pmc/articles/PMC7197058/ http://dx.doi.org/10.1007/978-3-030-42266-0_14 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Mandal, Sayan
Guzmán-Sáenz, Aldo
Haiminen, Niina
Basu, Saugata
Parida, Laxmi
A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data
title A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data
title_full A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data
title_fullStr A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data
title_full_unstemmed A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data
title_short A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data
title_sort topological data analysis approach on predicting phenotypes from gene expression data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197058/
http://dx.doi.org/10.1007/978-3-030-42266-0_14
work_keys_str_mv AT mandalsayan atopologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata
AT guzmansaenzaldo atopologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata
AT haiminenniina atopologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata
AT basusaugata atopologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata
AT paridalaxmi atopologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata
AT mandalsayan topologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata
AT guzmansaenzaldo topologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata
AT haiminenniina topologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata
AT basusaugata topologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata
AT paridalaxmi topologicaldataanalysisapproachonpredictingphenotypesfromgeneexpressiondata