Cargando…
A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA
BACKGROUND: Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features. METHODS: In this work, a gene signature selection strategy for TCGA dat...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357346/ https://www.ncbi.nlm.nih.gov/pubmed/30704464 http://dx.doi.org/10.1186/s12920-018-0451-x |
_version_ | 1783391765023162368 |
---|---|
author | Fan, Shicai Tang, Jianxiong Tian, Qi Wu, Chunguo |
author_facet | Fan, Shicai Tang, Jianxiong Tian, Qi Wu, Chunguo |
author_sort | Fan, Shicai |
collection | PubMed |
description | BACKGROUND: Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features. METHODS: In this work, a gene signature selection strategy for TCGA data was proposed by integrating the gene expression data, the methylation data and the prior knowledge about cancer biomarkers. Different from the traditional integration method, the expanded 450 K methylation data were applied instead of the original 450 K array data, and the reported biomarkers were weighted in the feature selection. Fuzzy rule based classification method and cross validation strategy were applied in the model construction for performance evaluation. RESULTS: Our selected gene features showed prediction accuracy close to 100% in the cross validation with fuzzy rule based classification model on 6 cancers from TCGA. The cross validation performance of our proposed model is similar to other integrative models or RNA-seq only model, while the prediction performance on independent data is obviously better than other 5 models. The gene signatures extracted with our fuzzy rule based integrative feature selection strategy were more robust, and had the potential to get better prediction results. CONCLUSION: The results indicated that the integration of expanded methylation data would cover more genes, and had greater capacity to retrieve the signature genes compared with the original 450 K methylation data. Also, the integration of the reported biomarkers was a promising way to improve the performance. PTCHD3 gene was selected as a discriminating gene in 3 out of the 6 cancers, which suggested that it might play important role in the cancer risk and would be worthy for the intensive investigation. |
format | Online Article Text |
id | pubmed-6357346 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63573462019-02-07 A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA Fan, Shicai Tang, Jianxiong Tian, Qi Wu, Chunguo BMC Med Genomics Research BACKGROUND: Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features. METHODS: In this work, a gene signature selection strategy for TCGA data was proposed by integrating the gene expression data, the methylation data and the prior knowledge about cancer biomarkers. Different from the traditional integration method, the expanded 450 K methylation data were applied instead of the original 450 K array data, and the reported biomarkers were weighted in the feature selection. Fuzzy rule based classification method and cross validation strategy were applied in the model construction for performance evaluation. RESULTS: Our selected gene features showed prediction accuracy close to 100% in the cross validation with fuzzy rule based classification model on 6 cancers from TCGA. The cross validation performance of our proposed model is similar to other integrative models or RNA-seq only model, while the prediction performance on independent data is obviously better than other 5 models. The gene signatures extracted with our fuzzy rule based integrative feature selection strategy were more robust, and had the potential to get better prediction results. CONCLUSION: The results indicated that the integration of expanded methylation data would cover more genes, and had greater capacity to retrieve the signature genes compared with the original 450 K methylation data. Also, the integration of the reported biomarkers was a promising way to improve the performance. PTCHD3 gene was selected as a discriminating gene in 3 out of the 6 cancers, which suggested that it might play important role in the cancer risk and would be worthy for the intensive investigation. BioMed Central 2019-01-31 /pmc/articles/PMC6357346/ /pubmed/30704464 http://dx.doi.org/10.1186/s12920-018-0451-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Fan, Shicai Tang, Jianxiong Tian, Qi Wu, Chunguo A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA |
title | A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA |
title_full | A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA |
title_fullStr | A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA |
title_full_unstemmed | A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA |
title_short | A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA |
title_sort | robust fuzzy rule based integrative feature selection strategy for gene expression data in tcga |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357346/ https://www.ncbi.nlm.nih.gov/pubmed/30704464 http://dx.doi.org/10.1186/s12920-018-0451-x |
work_keys_str_mv | AT fanshicai arobustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga AT tangjianxiong arobustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga AT tianqi arobustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga AT wuchunguo arobustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga AT fanshicai robustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga AT tangjianxiong robustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga AT tianqi robustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga AT wuchunguo robustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga |