Cargando…

Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder

Genomics involving tens of thousands of genes is a complex system determining phenotype. An interesting and vital issue is how to integrate highly sparse genetic genomics data with a mass of minor effects into a prediction model for improving prediction power. We find that the deep learning method c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shen, Junjie, Li, Huijun, Yu, Xinghao, Bai, Lu, Dong, Yongfei, Cao, Jianping, Lu, Ke, Tang, Zaixiang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Oncology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872139/ https://www.ncbi.nlm.nih.gov/pubmed/36703783 http://dx.doi.org/10.3389/fonc.2022.1091767

_version_	1784877339839037440
author	Shen, Junjie Li, Huijun Yu, Xinghao Bai, Lu Dong, Yongfei Cao, Jianping Lu, Ke Tang, Zaixiang
author_facet	Shen, Junjie Li, Huijun Yu, Xinghao Bai, Lu Dong, Yongfei Cao, Jianping Lu, Ke Tang, Zaixiang
author_sort	Shen, Junjie
collection	PubMed
description	Genomics involving tens of thousands of genes is a complex system determining phenotype. An interesting and vital issue is how to integrate highly sparse genetic genomics data with a mass of minor effects into a prediction model for improving prediction power. We find that the deep learning method can work well to extract features by transforming highly sparse dichotomous data to lower-dimensional continuous data in a non-linear way. This may provide benefits in risk prediction-associated genotype data. We developed a multi-stage strategy to extract information from highly sparse binary genotype data and applied it for cancer prognosis. Specifically, we first reduced the size of binary biomarkers via a univariable regression model to a moderate size. Then, a trainable auto-encoder was used to learn compact features from the reduced data. Next, we performed a LASSO problem process to select the optimal combination of extracted features. Lastly, we applied such feature combination to real cancer prognostic models and evaluated the raw predictive effect of the models. The results indicated that these compressed transformation features could better improve the model’s original predictive performance and might avoid an overfitting problem. This idea may be enlightening for everyone involved in cancer research, risk reduction, treatment, and patient care via integrating genomics data.
format	Online Article Text
id	pubmed-9872139
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-98721392023-01-25 Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder Shen, Junjie Li, Huijun Yu, Xinghao Bai, Lu Dong, Yongfei Cao, Jianping Lu, Ke Tang, Zaixiang Front Oncol Oncology Genomics involving tens of thousands of genes is a complex system determining phenotype. An interesting and vital issue is how to integrate highly sparse genetic genomics data with a mass of minor effects into a prediction model for improving prediction power. We find that the deep learning method can work well to extract features by transforming highly sparse dichotomous data to lower-dimensional continuous data in a non-linear way. This may provide benefits in risk prediction-associated genotype data. We developed a multi-stage strategy to extract information from highly sparse binary genotype data and applied it for cancer prognosis. Specifically, we first reduced the size of binary biomarkers via a univariable regression model to a moderate size. Then, a trainable auto-encoder was used to learn compact features from the reduced data. Next, we performed a LASSO problem process to select the optimal combination of extracted features. Lastly, we applied such feature combination to real cancer prognostic models and evaluated the raw predictive effect of the models. The results indicated that these compressed transformation features could better improve the model’s original predictive performance and might avoid an overfitting problem. This idea may be enlightening for everyone involved in cancer research, risk reduction, treatment, and patient care via integrating genomics data. Frontiers Media S.A. 2023-01-10 /pmc/articles/PMC9872139/ /pubmed/36703783 http://dx.doi.org/10.3389/fonc.2022.1091767 Text en Copyright © 2023 Shen, Li, Yu, Bai, Dong, Cao, Lu and Tang https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Oncology Shen, Junjie Li, Huijun Yu, Xinghao Bai, Lu Dong, Yongfei Cao, Jianping Lu, Ke Tang, Zaixiang Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder
title	Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder
title_full	Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder
title_fullStr	Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder
title_full_unstemmed	Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder
title_short	Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder
title_sort	efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder
topic	Oncology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872139/ https://www.ncbi.nlm.nih.gov/pubmed/36703783 http://dx.doi.org/10.3389/fonc.2022.1091767
work_keys_str_mv	AT shenjunjie efficientfeatureextractionfromhighlysparsebinarygenotypedataforcancerprognosispredictionusinganautoencoder AT lihuijun efficientfeatureextractionfromhighlysparsebinarygenotypedataforcancerprognosispredictionusinganautoencoder AT yuxinghao efficientfeatureextractionfromhighlysparsebinarygenotypedataforcancerprognosispredictionusinganautoencoder AT bailu efficientfeatureextractionfromhighlysparsebinarygenotypedataforcancerprognosispredictionusinganautoencoder AT dongyongfei efficientfeatureextractionfromhighlysparsebinarygenotypedataforcancerprognosispredictionusinganautoencoder AT caojianping efficientfeatureextractionfromhighlysparsebinarygenotypedataforcancerprognosispredictionusinganautoencoder AT luke efficientfeatureextractionfromhighlysparsebinarygenotypedataforcancerprognosispredictionusinganautoencoder AT tangzaixiang efficientfeatureextractionfromhighlysparsebinarygenotypedataforcancerprognosispredictionusinganautoencoder

Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder

Ejemplares similares