Cargando…
Case-Based Retrieval Framework for Gene Expression Data
BACKGROUND: The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4368049/ https://www.ncbi.nlm.nih.gov/pubmed/25861214 http://dx.doi.org/10.4137/CIN.S22371 |
_version_ | 1782362593149583360 |
---|---|
author | Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R Braytee, Ali Kennedy, Paul J |
author_facet | Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R Braytee, Ali Kennedy, Paul J |
author_sort | Anaissi, Ali |
collection | PubMed |
description | BACKGROUND: The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. METHODS: This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. RESULTS: The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children’s Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. CONCLUSION: The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps. |
format | Online Article Text |
id | pubmed-4368049 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-43680492015-04-08 Case-Based Retrieval Framework for Gene Expression Data Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R Braytee, Ali Kennedy, Paul J Cancer Inform Methodology BACKGROUND: The process of retrieving similar cases in a case-based reasoning system is considered a big challenge for gene expression data sets. The huge number of gene expression values generated by microarray technology leads to complex data sets and similarity measures for high-dimensional data are problematic. Hence, gene expression similarity measurements require numerous machine-learning and data-mining techniques, such as feature selection and dimensionality reduction, to be incorporated into the retrieval process. METHODS: This article proposes a case-based retrieval framework that uses a k-nearest-neighbor classifier with a weighted-feature-based similarity to retrieve previously treated patients based on their gene expression profiles. RESULTS: The herein-proposed methodology is validated on several data sets: a childhood leukemia data set collected from The Children’s Hospital at Westmead, as well as the Colon cancer, the National Cancer Institute (NCI), and the Prostate cancer data sets. Results obtained by the proposed framework in retrieving patients of the data sets who are similar to new patients are as follows: 96% accuracy on the childhood leukemia data set, 95% on the NCI data set, 93% on the Colon cancer data set, and 98% on the Prostate cancer data set. CONCLUSION: The designed case-based retrieval framework is an appropriate choice for retrieving previous patients who are similar to a new patient, on the basis of their gene expression data, for better diagnosis and treatment of childhood leukemia. Moreover, this framework can be applied to other gene expression data sets using some or all of its steps. Libertas Academica 2015-03-19 /pmc/articles/PMC4368049/ /pubmed/25861214 http://dx.doi.org/10.4137/CIN.S22371 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License. |
spellingShingle | Methodology Anaissi, Ali Goyal, Madhu Catchpoole, Daniel R Braytee, Ali Kennedy, Paul J Case-Based Retrieval Framework for Gene Expression Data |
title | Case-Based Retrieval Framework for Gene Expression Data |
title_full | Case-Based Retrieval Framework for Gene Expression Data |
title_fullStr | Case-Based Retrieval Framework for Gene Expression Data |
title_full_unstemmed | Case-Based Retrieval Framework for Gene Expression Data |
title_short | Case-Based Retrieval Framework for Gene Expression Data |
title_sort | case-based retrieval framework for gene expression data |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4368049/ https://www.ncbi.nlm.nih.gov/pubmed/25861214 http://dx.doi.org/10.4137/CIN.S22371 |
work_keys_str_mv | AT anaissiali casebasedretrievalframeworkforgeneexpressiondata AT goyalmadhu casebasedretrievalframeworkforgeneexpressiondata AT catchpooledanielr casebasedretrievalframeworkforgeneexpressiondata AT brayteeali casebasedretrievalframeworkforgeneexpressiondata AT kennedypaulj casebasedretrievalframeworkforgeneexpressiondata |