Cargando…

A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA

BACKGROUND: Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features. METHODS: In this work, a gene signature selection strategy for TCGA dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Fan, Shicai, Tang, Jianxiong, Tian, Qi, Wu, Chunguo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357346/
https://www.ncbi.nlm.nih.gov/pubmed/30704464
http://dx.doi.org/10.1186/s12920-018-0451-x
_version_ 1783391765023162368
author Fan, Shicai
Tang, Jianxiong
Tian, Qi
Wu, Chunguo
author_facet Fan, Shicai
Tang, Jianxiong
Tian, Qi
Wu, Chunguo
author_sort Fan, Shicai
collection PubMed
description BACKGROUND: Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features. METHODS: In this work, a gene signature selection strategy for TCGA data was proposed by integrating the gene expression data, the methylation data and the prior knowledge about cancer biomarkers. Different from the traditional integration method, the expanded 450 K methylation data were applied instead of the original 450 K array data, and the reported biomarkers were weighted in the feature selection. Fuzzy rule based classification method and cross validation strategy were applied in the model construction for performance evaluation. RESULTS: Our selected gene features showed prediction accuracy close to 100% in the cross validation with fuzzy rule based classification model on 6 cancers from TCGA. The cross validation performance of our proposed model is similar to other integrative models or RNA-seq only model, while the prediction performance on independent data is obviously better than other 5 models. The gene signatures extracted with our fuzzy rule based integrative feature selection strategy were more robust, and had the potential to get better prediction results. CONCLUSION: The results indicated that the integration of expanded methylation data would cover more genes, and had greater capacity to retrieve the signature genes compared with the original 450 K methylation data. Also, the integration of the reported biomarkers was a promising way to improve the performance. PTCHD3 gene was selected as a discriminating gene in 3 out of the 6 cancers, which suggested that it might play important role in the cancer risk and would be worthy for the intensive investigation.
format Online
Article
Text
id pubmed-6357346
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63573462019-02-07 A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA Fan, Shicai Tang, Jianxiong Tian, Qi Wu, Chunguo BMC Med Genomics Research BACKGROUND: Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features. METHODS: In this work, a gene signature selection strategy for TCGA data was proposed by integrating the gene expression data, the methylation data and the prior knowledge about cancer biomarkers. Different from the traditional integration method, the expanded 450 K methylation data were applied instead of the original 450 K array data, and the reported biomarkers were weighted in the feature selection. Fuzzy rule based classification method and cross validation strategy were applied in the model construction for performance evaluation. RESULTS: Our selected gene features showed prediction accuracy close to 100% in the cross validation with fuzzy rule based classification model on 6 cancers from TCGA. The cross validation performance of our proposed model is similar to other integrative models or RNA-seq only model, while the prediction performance on independent data is obviously better than other 5 models. The gene signatures extracted with our fuzzy rule based integrative feature selection strategy were more robust, and had the potential to get better prediction results. CONCLUSION: The results indicated that the integration of expanded methylation data would cover more genes, and had greater capacity to retrieve the signature genes compared with the original 450 K methylation data. Also, the integration of the reported biomarkers was a promising way to improve the performance. PTCHD3 gene was selected as a discriminating gene in 3 out of the 6 cancers, which suggested that it might play important role in the cancer risk and would be worthy for the intensive investigation. BioMed Central 2019-01-31 /pmc/articles/PMC6357346/ /pubmed/30704464 http://dx.doi.org/10.1186/s12920-018-0451-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Fan, Shicai
Tang, Jianxiong
Tian, Qi
Wu, Chunguo
A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA
title A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA
title_full A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA
title_fullStr A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA
title_full_unstemmed A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA
title_short A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA
title_sort robust fuzzy rule based integrative feature selection strategy for gene expression data in tcga
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357346/
https://www.ncbi.nlm.nih.gov/pubmed/30704464
http://dx.doi.org/10.1186/s12920-018-0451-x
work_keys_str_mv AT fanshicai arobustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga
AT tangjianxiong arobustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga
AT tianqi arobustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga
AT wuchunguo arobustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga
AT fanshicai robustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga
AT tangjianxiong robustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga
AT tianqi robustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga
AT wuchunguo robustfuzzyrulebasedintegrativefeatureselectionstrategyforgeneexpressiondataintcga