Cargando…
A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis
With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature (signature genes) selection in support of making sense out of such high dimensional data....
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6158772/ https://www.ncbi.nlm.nih.gov/pubmed/30275937 http://dx.doi.org/10.1016/j.csbj.2018.02.005 |
_version_ | 1783358485125136384 |
---|---|
author | Liang, Sen Ma, Anjun Yang, Sen Wang, Yan Ma, Qin |
author_facet | Liang, Sen Ma, Anjun Yang, Sen Wang, Yan Ma, Qin |
author_sort | Liang, Sen |
collection | PubMed |
description | With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature (signature genes) selection in support of making sense out of such high dimensional data. These computational methods significantly facilitate further data analysis and interpretation, such as gene function enrichment analysis, cancer biomarker detection, and drug targeting identification in precision medicine. Although numerous methods have been developed for feature selection in bioinformatics, it is still a challenge to choose the appropriate methods for a specific problem and seek for the most reasonable ranking features. Meanwhile, the paired gene expression data under matched case-control design (MCCD) is becoming increasingly popular, which has often been used in multi-omics integration studies and may increase feature selection efficiency by offsetting similar distributions of confounding features. The appropriate feature selection methods specifically designed for the paired data, which is named as matched-pairs feature selection (MPFS), however, have not been maturely developed in parallel. In this review, we compare the performance of 10 feature-selection methods (eight MPFS methods and two traditional unpaired methods) on two real datasets by applied three classification methods, and analyze the algorithm complexity of these methods through the running of their programs. This review aims to induce and comprehensively present the MPFS in such a way that readers can easily understand its characteristics and get a clue in selecting the appropriate methods for their analyses. |
format | Online Article Text |
id | pubmed-6158772 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-61587722018-10-01 A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis Liang, Sen Ma, Anjun Yang, Sen Wang, Yan Ma, Qin Comput Struct Biotechnol J Review Article With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature (signature genes) selection in support of making sense out of such high dimensional data. These computational methods significantly facilitate further data analysis and interpretation, such as gene function enrichment analysis, cancer biomarker detection, and drug targeting identification in precision medicine. Although numerous methods have been developed for feature selection in bioinformatics, it is still a challenge to choose the appropriate methods for a specific problem and seek for the most reasonable ranking features. Meanwhile, the paired gene expression data under matched case-control design (MCCD) is becoming increasingly popular, which has often been used in multi-omics integration studies and may increase feature selection efficiency by offsetting similar distributions of confounding features. The appropriate feature selection methods specifically designed for the paired data, which is named as matched-pairs feature selection (MPFS), however, have not been maturely developed in parallel. In this review, we compare the performance of 10 feature-selection methods (eight MPFS methods and two traditional unpaired methods) on two real datasets by applied three classification methods, and analyze the algorithm complexity of these methods through the running of their programs. This review aims to induce and comprehensively present the MPFS in such a way that readers can easily understand its characteristics and get a clue in selecting the appropriate methods for their analyses. Research Network of Computational and Structural Biotechnology 2018-02-25 /pmc/articles/PMC6158772/ /pubmed/30275937 http://dx.doi.org/10.1016/j.csbj.2018.02.005 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Review Article Liang, Sen Ma, Anjun Yang, Sen Wang, Yan Ma, Qin A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis |
title | A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis |
title_full | A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis |
title_fullStr | A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis |
title_full_unstemmed | A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis |
title_short | A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis |
title_sort | review of matched-pairs feature selection methods for gene expression data analysis |
topic | Review Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6158772/ https://www.ncbi.nlm.nih.gov/pubmed/30275937 http://dx.doi.org/10.1016/j.csbj.2018.02.005 |
work_keys_str_mv | AT liangsen areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis AT maanjun areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis AT yangsen areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis AT wangyan areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis AT maqin areviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis AT liangsen reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis AT maanjun reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis AT yangsen reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis AT wangyan reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis AT maqin reviewofmatchedpairsfeatureselectionmethodsforgeneexpressiondataanalysis |