Cargando…

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

BACKGROUND & OBJECTIVE: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is us...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ram, Malihe, Najafi, Ali, Shakeri, Mohammad Taghi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Iranian Society of Pathology 2017
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5844678/ https://www.ncbi.nlm.nih.gov/pubmed/29563929

_version_	1783305282610266112
author	Ram, Malihe Najafi, Ali Shakeri, Mohammad Taghi
author_facet	Ram, Malihe Najafi, Ali Shakeri, Mohammad Taghi
author_sort	Ram, Malihe
collection	PubMed
description	BACKGROUND & OBJECTIVE: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and small-n. Therefore, RF can be used to select and rank the genes for the diagnosis and effective treatment of cancer. METHODS: The microarray gene expression data of colon, leukemia, and prostate cancers were collected from public databases. Primary preprocessing was done on them using limma package, and then, the RF classification method was implemented on datasets separately in R software. Finally, the selected genes in each of the cancers were evaluated and compared with those of previous experimental studies and their functionalities were assessed in molecular cancer processes. RESULT: The RF method extracted very small sets of genes while it retained its predictive performance. About colon cancer data set DIEXF, GUCA2A, CA7, and IGHA1 key genes with the accuracy of 87.39 and precision of 85.45 were selected. The SNCA, USP20, and SNRPA1 genes were selected for prostate cancer with the accuracy of 73.33 and precision of 66.67. Also, key genes of leukemia data set were BAG4, ANKHD1-EIF4EBP3, PLXNC1, and PCDH9 genes, and the accuracy and precision were 100 and 95.24, respectively. CONCLUSION: The current study results showed most of the selected genes involved in the processes and cancerous pathways were previously reported and had an important role in shifting from normal cell to abnormal.
format	Online Article Text
id	pubmed-5844678
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Iranian Society of Pathology
record_format	MEDLINE/PubMed
spelling	pubmed-58446782018-03-21 Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest Ram, Malihe Najafi, Ali Shakeri, Mohammad Taghi Iran J Pathol Original Article BACKGROUND & OBJECTIVE: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and small-n. Therefore, RF can be used to select and rank the genes for the diagnosis and effective treatment of cancer. METHODS: The microarray gene expression data of colon, leukemia, and prostate cancers were collected from public databases. Primary preprocessing was done on them using limma package, and then, the RF classification method was implemented on datasets separately in R software. Finally, the selected genes in each of the cancers were evaluated and compared with those of previous experimental studies and their functionalities were assessed in molecular cancer processes. RESULT: The RF method extracted very small sets of genes while it retained its predictive performance. About colon cancer data set DIEXF, GUCA2A, CA7, and IGHA1 key genes with the accuracy of 87.39 and precision of 85.45 were selected. The SNCA, USP20, and SNRPA1 genes were selected for prostate cancer with the accuracy of 73.33 and precision of 66.67. Also, key genes of leukemia data set were BAG4, ANKHD1-EIF4EBP3, PLXNC1, and PCDH9 genes, and the accuracy and precision were 100 and 95.24, respectively. CONCLUSION: The current study results showed most of the selected genes involved in the processes and cancerous pathways were previously reported and had an important role in shifting from normal cell to abnormal. Iranian Society of Pathology 2017 2017-10-01 /pmc/articles/PMC5844678/ /pubmed/29563929 Text en © 2017, IRANIAN JOURNAL OF PATHOLOGY. This is an Open Access article distributed under the terms of the Creative Commons Attribution License, (http://creativecommons.org/licenses/by/3.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Ram, Malihe Najafi, Ali Shakeri, Mohammad Taghi Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
title	Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
title_full	Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
title_fullStr	Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
title_full_unstemmed	Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
title_short	Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
title_sort	classification and biomarker genes selection for cancer gene expression data using random forest
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5844678/ https://www.ncbi.nlm.nih.gov/pubmed/29563929
work_keys_str_mv	AT rammalihe classificationandbiomarkergenesselectionforcancergeneexpressiondatausingrandomforest AT najafiali classificationandbiomarkergenesselectionforcancergeneexpressiondatausingrandomforest AT shakerimohammadtaghi classificationandbiomarkergenesselectionforcancergeneexpressiondatausingrandomforest

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Ejemplares similares