Cargando…

A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data

Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fa...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Wen Bo, Liang, Sheng Nan, Qin, Xi Wen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8513872/
https://www.ncbi.nlm.nih.gov/pubmed/34644329
http://dx.doi.org/10.1371/journal.pone.0258326
_version_ 1784583284770996224
author Liu, Wen Bo
Liang, Sheng Nan
Qin, Xi Wen
author_facet Liu, Wen Bo
Liang, Sheng Nan
Qin, Xi Wen
author_sort Liu, Wen Bo
collection PubMed
description Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fail to improved classification performance. To counter this problem, this paper proposes a dimension-reduction algorithm based on weighted kernel principal component analysis (WKPCA), constructs kernel function weights according to kernel matrix eigenvalues, and combines multiple kernel functions to reduce the feature dimensions. To further improve the dimensional reduction efficiency of WKPCA, t-class kernel functions are constructed, and corresponding theoretical proofs are given. Moreover, the cumulative optimal performance rate is constructed to measure the overall performance of WKPCA combined with machine learning algorithms. Naive Bayes, K-nearest neighbour, random forest, iterative random forest and support vector machine approaches are used in classifiers to analyse 6 real gene expression dataset. Compared with the all-variable model, linear principal component dimension reduction and single kernel function dimension reduction, the results show that the classification performance of the 5 machine learning methods mentioned above can be improved effectively by WKPCA dimension reduction.
format Online
Article
Text
id pubmed-8513872
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-85138722021-10-14 A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data Liu, Wen Bo Liang, Sheng Nan Qin, Xi Wen PLoS One Research Article Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fail to improved classification performance. To counter this problem, this paper proposes a dimension-reduction algorithm based on weighted kernel principal component analysis (WKPCA), constructs kernel function weights according to kernel matrix eigenvalues, and combines multiple kernel functions to reduce the feature dimensions. To further improve the dimensional reduction efficiency of WKPCA, t-class kernel functions are constructed, and corresponding theoretical proofs are given. Moreover, the cumulative optimal performance rate is constructed to measure the overall performance of WKPCA combined with machine learning algorithms. Naive Bayes, K-nearest neighbour, random forest, iterative random forest and support vector machine approaches are used in classifiers to analyse 6 real gene expression dataset. Compared with the all-variable model, linear principal component dimension reduction and single kernel function dimension reduction, the results show that the classification performance of the 5 machine learning methods mentioned above can be improved effectively by WKPCA dimension reduction. Public Library of Science 2021-10-13 /pmc/articles/PMC8513872/ /pubmed/34644329 http://dx.doi.org/10.1371/journal.pone.0258326 Text en © 2021 Liu et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Liu, Wen Bo
Liang, Sheng Nan
Qin, Xi Wen
A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data
title A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data
title_full A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data
title_fullStr A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data
title_full_unstemmed A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data
title_short A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data
title_sort novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8513872/
https://www.ncbi.nlm.nih.gov/pubmed/34644329
http://dx.doi.org/10.1371/journal.pone.0258326
work_keys_str_mv AT liuwenbo anoveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT liangshengnan anoveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT qinxiwen anoveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT liuwenbo noveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT liangshengnan noveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata
AT qinxiwen noveldimensionreductionalgorithmbasedonweightedkernelprincipalanalysisforgeneexpressiondata