Cargando…

A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis

With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature (signature genes) selection in support of making sense out of such high dimensional data....

Descripción completa

Detalles Bibliográficos
Autores principales: Liang, Sen, Ma, Anjun, Yang, Sen, Wang, Yan, Ma, Qin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6158772/
https://www.ncbi.nlm.nih.gov/pubmed/30275937
http://dx.doi.org/10.1016/j.csbj.2018.02.005
_version_ 1783358485125136384
author Liang, Sen
Ma, Anjun
Yang, Sen
Wang, Yan
Ma, Qin
author_facet Liang, Sen
Ma, Anjun
Yang, Sen
Wang, Yan
Ma, Qin
author_sort Liang, Sen
collection PubMed
description With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature (signature genes) selection in support of making sense out of such high dimensional data. These computational methods significantly facilitate further data analysis and interpretation, such as gene function enrichment analysis, cancer biomarker detection, and drug targeting identification in precision medicine. Although numerous methods have been developed for feature selection in bioinformatics, it is still a challenge to choose the appropriate methods for a specific problem and seek for the most reasonable ranking features. Meanwhile, the paired gene expression data under matched case-control design (MCCD) is becoming increasingly popular, which has often been used in multi-omics integration studies and may increase feature selection efficiency by offsetting similar distributions of confounding features. The appropriate feature selection methods specifically designed for the paired data, which is named as matched-pairs feature selection (MPFS), however, have not been maturely developed in parallel. In this review, we compare the performance of 10 feature-selection methods (eight MPFS methods and two traditional unpaired methods) on two real datasets by applied three classification methods, and analyze the algorithm complexity of these methods through the running of their programs. This review aims to induce and comprehensively present the MPFS in such a way that readers can easily understand its characteristics and get a clue in selecting the appropriate methods for their analyses.
format Online
Article
Text
id pubmed-6158772
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-61587722018-10-01 A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis Liang, Sen Ma, Anjun Yang, Sen Wang, Yan Ma, Qin Comput Struct Biotechnol J Review Article With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature (signature genes) selection in support of making sense out of such high dimensional data. These computational methods significantly facilitate further data analysis and interpretation, such as gene function enrichment analysis, cancer biomarker detection, and drug targeting identification in precision medicine. Although numerous methods have been developed for feature selection in bioinformatics, it is still a challenge to choose the appropriate methods for a specific problem and seek for the most reasonable ranking features. Meanwhile, the paired gene expression data under matched case-control design (MCCD) is becoming increasingly popular, which has often been used in multi-omics integration studies and may increase feature selection efficiency by offsetting similar distributions of confounding features. The appropriate feature selection methods specifically designed for the paired data, which is named as matched-pairs feature selection (MPFS), however, have not been maturely developed in parallel. In this review, we compare the performance of 10 feature-selection methods (eight MPFS methods and two traditional unpaired methods) on two real datasets by applied three classification methods, and analyze the algorithm complexity of these methods through the running of their programs. This review aims to induce and comprehensively present the MPFS in such a way that readers can easily understand its characteristics and get a clue in selecting the appropriate methods for their analyses. Research Network of Computational and Structural Biotechnology 2018-02-25 /pmc/articles/PMC6158772/ /pubmed/30275937 http://dx.doi.org/10.1016/j.csbj.2018.02.005 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Review Article
Liang, Sen
Ma, Anjun
Yang, Sen
Wang, Yan
Ma, Qin
A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis
title A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis
title_full A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis
title_fullStr A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis
title_full_unstemmed A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis
title_short A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis
title_sort review of matched-pairs feature selection methods for gene expression data analysis
topic Review Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6158772/
https://www.ncbi.nlm.nih.gov/pubmed/30275937
http://dx.doi.org/10.1016/j.csbj.2018.02.005
work_keys_str_mv AT liangsen areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis
AT maanjun areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis
AT yangsen areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis
AT wangyan areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis
AT maqin areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis
AT liangsen reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis
AT maanjun reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis
AT yangsen reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis
AT wangyan reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis
AT maqin reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis