Cargando…
Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477337/ https://www.ncbi.nlm.nih.gov/pubmed/36107929 http://dx.doi.org/10.1371/journal.pone.0274440 |
_version_ | 1784790338179694592 |
---|---|
author | Chen, Li-Pang |
author_facet | Chen, Li-Pang |
author_sort | Chen, Li-Pang |
collection | PubMed |
description | Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which makes that (i) many predictors in the dataset might be non-informative, (ii) pairwise dependence structures possibly exist among high-dimensional predictors, yielding the network structure. While many supervised learning methods have been developed, it is expected that the prediction performance would be affected if impacts of ultrahigh-dimensionality were not carefully addressed. In this paper, we propose a new statistical learning algorithm to deal with multi-classification subject to ultrahigh-dimensional gene expressions. In the proposed algorithm, we employ the model-free feature screening method to retain informative gene expression values from ultrahigh-dimensional data, and then construct predictive models with network structures of selected gene expression accommodated. Different from existing supervised learning methods that build predictive models based on entire dataset, our approach is able to identify informative predictors and dependence structures for gene expression. Throughout analysis of a real dataset, we find that the proposed algorithm gives precise classification as well as accurate prediction, and outperforms some commonly used supervised learning methods. |
format | Online Article Text |
id | pubmed-9477337 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-94773372022-09-16 Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions Chen, Li-Pang PLoS One Research Article Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which makes that (i) many predictors in the dataset might be non-informative, (ii) pairwise dependence structures possibly exist among high-dimensional predictors, yielding the network structure. While many supervised learning methods have been developed, it is expected that the prediction performance would be affected if impacts of ultrahigh-dimensionality were not carefully addressed. In this paper, we propose a new statistical learning algorithm to deal with multi-classification subject to ultrahigh-dimensional gene expressions. In the proposed algorithm, we employ the model-free feature screening method to retain informative gene expression values from ultrahigh-dimensional data, and then construct predictive models with network structures of selected gene expression accommodated. Different from existing supervised learning methods that build predictive models based on entire dataset, our approach is able to identify informative predictors and dependence structures for gene expression. Throughout analysis of a real dataset, we find that the proposed algorithm gives precise classification as well as accurate prediction, and outperforms some commonly used supervised learning methods. Public Library of Science 2022-09-15 /pmc/articles/PMC9477337/ /pubmed/36107929 http://dx.doi.org/10.1371/journal.pone.0274440 Text en © 2022 Li-Pang Chen https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Chen, Li-Pang Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions |
title | Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions |
title_full | Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions |
title_fullStr | Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions |
title_full_unstemmed | Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions |
title_short | Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions |
title_sort | classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477337/ https://www.ncbi.nlm.nih.gov/pubmed/36107929 http://dx.doi.org/10.1371/journal.pone.0274440 |
work_keys_str_mv | AT chenlipang classificationandpredictionformulticancerdatawithultrahighdimensionalgeneexpressions |