Cargando…

Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions

Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which...

Descripción completa

Detalles Bibliográficos
Autor principal: Chen, Li-Pang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477337/
https://www.ncbi.nlm.nih.gov/pubmed/36107929
http://dx.doi.org/10.1371/journal.pone.0274440
_version_ 1784790338179694592
author Chen, Li-Pang
author_facet Chen, Li-Pang
author_sort Chen, Li-Pang
collection PubMed
description Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which makes that (i) many predictors in the dataset might be non-informative, (ii) pairwise dependence structures possibly exist among high-dimensional predictors, yielding the network structure. While many supervised learning methods have been developed, it is expected that the prediction performance would be affected if impacts of ultrahigh-dimensionality were not carefully addressed. In this paper, we propose a new statistical learning algorithm to deal with multi-classification subject to ultrahigh-dimensional gene expressions. In the proposed algorithm, we employ the model-free feature screening method to retain informative gene expression values from ultrahigh-dimensional data, and then construct predictive models with network structures of selected gene expression accommodated. Different from existing supervised learning methods that build predictive models based on entire dataset, our approach is able to identify informative predictors and dependence structures for gene expression. Throughout analysis of a real dataset, we find that the proposed algorithm gives precise classification as well as accurate prediction, and outperforms some commonly used supervised learning methods.
format Online
Article
Text
id pubmed-9477337
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-94773372022-09-16 Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions Chen, Li-Pang PLoS One Research Article Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which makes that (i) many predictors in the dataset might be non-informative, (ii) pairwise dependence structures possibly exist among high-dimensional predictors, yielding the network structure. While many supervised learning methods have been developed, it is expected that the prediction performance would be affected if impacts of ultrahigh-dimensionality were not carefully addressed. In this paper, we propose a new statistical learning algorithm to deal with multi-classification subject to ultrahigh-dimensional gene expressions. In the proposed algorithm, we employ the model-free feature screening method to retain informative gene expression values from ultrahigh-dimensional data, and then construct predictive models with network structures of selected gene expression accommodated. Different from existing supervised learning methods that build predictive models based on entire dataset, our approach is able to identify informative predictors and dependence structures for gene expression. Throughout analysis of a real dataset, we find that the proposed algorithm gives precise classification as well as accurate prediction, and outperforms some commonly used supervised learning methods. Public Library of Science 2022-09-15 /pmc/articles/PMC9477337/ /pubmed/36107929 http://dx.doi.org/10.1371/journal.pone.0274440 Text en © 2022 Li-Pang Chen https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Chen, Li-Pang
Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_full Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_fullStr Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_full_unstemmed Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_short Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_sort classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477337/
https://www.ncbi.nlm.nih.gov/pubmed/36107929
http://dx.doi.org/10.1371/journal.pone.0274440
work_keys_str_mv AT chenlipang classificationandpredictionformulticancerdatawithultrahighdimensionalgeneexpressions