Cargando…

Case-Based Retrieval Framework for Gene Expression Data

BACKGROUND: The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data...

Descripción completa

Detalles Bibliográficos
Autores principales: Anaissi, Ali, Goyal, Madhu, Catchpoole, Daniel R, Braytee, Ali, Kennedy, Paul J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4368049/
https://www.ncbi.nlm.nih.gov/pubmed/25861214
http://dx.doi.org/10.4137/CIN.S22371
_version_ 1782362593149583360
author Anaissi, Ali
Goyal, Madhu
Catchpoole, Daniel R
Braytee, Ali
Kennedy, Paul J
author_facet Anaissi, Ali
Goyal, Madhu
Catchpoole, Daniel R
Braytee, Ali
Kennedy, Paul J
author_sort Anaissi, Ali
collection PubMed
description BACKGROUND: The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. METHODS: This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. RESULTS: The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children’s Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. CONCLUSION: The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps.
format Online
Article
Text
id pubmed-4368049
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-43680492015-04-08 Case-Based Retrieval Framework for Gene Expression Data Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R Braytee, Ali Kennedy, Paul J Cancer Inform Methodology BACKGROUND: The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. METHODS: This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. RESULTS: The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children’s Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. CONCLUSION: The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps. Libertas Academica 2015-03-19 /pmc/articles/PMC4368049/ /pubmed/25861214 http://dx.doi.org/10.4137/CIN.S22371 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Methodology
Anaissi, Ali
Goyal, Madhu
Catchpoole, Daniel R
Braytee, Ali
Kennedy, Paul J
Case-Based Retrieval Framework for Gene Expression Data
title Case-Based Retrieval Framework for Gene Expression Data
title_full Case-Based Retrieval Framework for Gene Expression Data
title_fullStr Case-Based Retrieval Framework for Gene Expression Data
title_full_unstemmed Case-Based Retrieval Framework for Gene Expression Data
title_short Case-Based Retrieval Framework for Gene Expression Data
title_sort case-based retrieval framework for gene expression data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4368049/
https://www.ncbi.nlm.nih.gov/pubmed/25861214
http://dx.doi.org/10.4137/CIN.S22371
work_keys_str_mv AT anaissiali casebasedretrievalframeworkforgeneexpressiondata
AT goyalmadhu casebasedretrievalframeworkforgeneexpressiondata
AT catchpooledanielr casebasedretrievalframeworkforgeneexpressiondata
AT brayteeali casebasedretrievalframeworkforgeneexpressiondata
AT kennedypaulj casebasedretrievalframeworkforgeneexpressiondata