Cargando…

Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer

PURPOSE: There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. METHODS: In this study, we built a model that uses support vector machine (SV...

Descripción completa

Detalles Bibliográficos
Autores principales: Gabere, Musa Nur, Hussein, Mohamed Aly, Aziz, Mohammad Azhar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Dove Medical Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4898422/
https://www.ncbi.nlm.nih.gov/pubmed/27330311
http://dx.doi.org/10.2147/OTT.S98910
_version_ 1782436354199650304
author Gabere, Musa Nur
Hussein, Mohamed Aly
Aziz, Mohammad Azhar
author_facet Gabere, Musa Nur
Hussein, Mohamed Aly
Aziz, Mohammad Azhar
author_sort Gabere, Musa Nur
collection PubMed
description PURPOSE: There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. METHODS: In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). RESULTS: The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. CONCLUSION: This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples.
format Online
Article
Text
id pubmed-4898422
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Dove Medical Press
record_format MEDLINE/PubMed
spelling pubmed-48984222016-06-21 Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer Gabere, Musa Nur Hussein, Mohamed Aly Aziz, Mohammad Azhar Onco Targets Ther Original Research PURPOSE: There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. METHODS: In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). RESULTS: The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. CONCLUSION: This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. Dove Medical Press 2016-06-01 /pmc/articles/PMC4898422/ /pubmed/27330311 http://dx.doi.org/10.2147/OTT.S98910 Text en © 2016 Gabere et al. This work is published and licensed by Dove Medical Press Limited The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed.
spellingShingle Original Research
Gabere, Musa Nur
Hussein, Mohamed Aly
Aziz, Mohammad Azhar
Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_full Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_fullStr Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_full_unstemmed Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_short Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_sort filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4898422/
https://www.ncbi.nlm.nih.gov/pubmed/27330311
http://dx.doi.org/10.2147/OTT.S98910
work_keys_str_mv AT gaberemusanur filteredselectioncoupledwithsupportvectormachinesgenerateafunctionallyrelevantpredictionmodelforcolorectalcancer
AT husseinmohamedaly filteredselectioncoupledwithsupportvectormachinesgenerateafunctionallyrelevantpredictionmodelforcolorectalcancer
AT azizmohammadazhar filteredselectioncoupledwithsupportvectormachinesgenerateafunctionallyrelevantpredictionmodelforcolorectalcancer