Cargando…

Identification of disease-causing genes using microarray data mining and Gene Ontology

BACKGROUND: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of...

Descripción completa

Detalles Bibliográficos
Autores principales: Mohammadi, Azadeh, Saraee, Mohammad H, Salehi, Mansoor
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3037837/
https://www.ncbi.nlm.nih.gov/pubmed/21269461
http://dx.doi.org/10.1186/1755-8794-4-12
_version_ 1782198014860853248
author Mohammadi, Azadeh
Saraee, Mohammad H
Salehi, Mansoor
author_facet Mohammadi, Azadeh
Saraee, Mohammad H
Salehi, Mansoor
author_sort Mohammadi, Azadeh
collection PubMed
description BACKGROUND: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. METHODS: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. RESULTS: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. CONCLUSIONS: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers.
format Text
id pubmed-3037837
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30378372011-02-18 Identification of disease-causing genes using microarray data mining and Gene Ontology Mohammadi, Azadeh Saraee, Mohammad H Salehi, Mansoor BMC Med Genomics Research Article BACKGROUND: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. METHODS: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. RESULTS: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. CONCLUSIONS: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers. BioMed Central 2011-01-26 /pmc/articles/PMC3037837/ /pubmed/21269461 http://dx.doi.org/10.1186/1755-8794-4-12 Text en Copyright ©2011 Mohammadi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Mohammadi, Azadeh
Saraee, Mohammad H
Salehi, Mansoor
Identification of disease-causing genes using microarray data mining and Gene Ontology
title Identification of disease-causing genes using microarray data mining and Gene Ontology
title_full Identification of disease-causing genes using microarray data mining and Gene Ontology
title_fullStr Identification of disease-causing genes using microarray data mining and Gene Ontology
title_full_unstemmed Identification of disease-causing genes using microarray data mining and Gene Ontology
title_short Identification of disease-causing genes using microarray data mining and Gene Ontology
title_sort identification of disease-causing genes using microarray data mining and gene ontology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3037837/
https://www.ncbi.nlm.nih.gov/pubmed/21269461
http://dx.doi.org/10.1186/1755-8794-4-12
work_keys_str_mv AT mohammadiazadeh identificationofdiseasecausinggenesusingmicroarraydataminingandgeneontology
AT saraeemohammadh identificationofdiseasecausinggenesusingmicroarraydataminingandgeneontology
AT salehimansoor identificationofdiseasecausinggenesusingmicroarraydataminingandgeneontology