Cargando…

Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions

Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which...

Descripción completa

Detalles Bibliográficos
Autor principal:	Chen, Li-Pang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477337/ https://www.ncbi.nlm.nih.gov/pubmed/36107929 http://dx.doi.org/10.1371/journal.pone.0274440

_version_	1784790338179694592
author	Chen, Li-Pang
author_facet	Chen, Li-Pang
author_sort	Chen, Li-Pang
collection	PubMed
description	Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which makes that (i) many predictors in the dataset might be non-informative, (ii) pairwise dependence structures possibly exist among high-dimensional predictors, yielding the network structure. While many supervised learning methods have been developed, it is expected that the prediction performance would be affected if impacts of ultrahigh-dimensionality were not carefully addressed. In this paper, we propose a new statistical learning algorithm to deal with multi-classification subject to ultrahigh-dimensional gene expressions. In the proposed algorithm, we employ the model-free feature screening method to retain informative gene expression values from ultrahigh-dimensional data, and then construct predictive models with network structures of selected gene expression accommodated. Different from existing supervised learning methods that build predictive models based on entire dataset, our approach is able to identify informative predictors and dependence structures for gene expression. Throughout analysis of a real dataset, we find that the proposed algorithm gives precise classification as well as accurate prediction, and outperforms some commonly used supervised learning methods.
format	Online Article Text
id	pubmed-9477337
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-94773372022-09-16 Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions Chen, Li-Pang PLoS One Research Article Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which makes that (i) many predictors in the dataset might be non-informative, (ii) pairwise dependence structures possibly exist among high-dimensional predictors, yielding the network structure. While many supervised learning methods have been developed, it is expected that the prediction performance would be affected if impacts of ultrahigh-dimensionality were not carefully addressed. In this paper, we propose a new statistical learning algorithm to deal with multi-classification subject to ultrahigh-dimensional gene expressions. In the proposed algorithm, we employ the model-free feature screening method to retain informative gene expression values from ultrahigh-dimensional data, and then construct predictive models with network structures of selected gene expression accommodated. Different from existing supervised learning methods that build predictive models based on entire dataset, our approach is able to identify informative predictors and dependence structures for gene expression. Throughout analysis of a real dataset, we find that the proposed algorithm gives precise classification as well as accurate prediction, and outperforms some commonly used supervised learning methods. Public Library of Science 2022-09-15 /pmc/articles/PMC9477337/ /pubmed/36107929 http://dx.doi.org/10.1371/journal.pone.0274440 Text en © 2022 Li-Pang Chen https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Chen, Li-Pang Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title	Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_full	Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_fullStr	Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_full_unstemmed	Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_short	Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_sort	classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477337/ https://www.ncbi.nlm.nih.gov/pubmed/36107929 http://dx.doi.org/10.1371/journal.pone.0274440
work_keys_str_mv	AT chenlipang classificationandpredictionformulticancerdatawithultrahighdimensionalgeneexpressions

Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions

Ejemplares similares