Cargando…

Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems

BACKGROUND: The Signal-to-Noise-Ratio (SNR) is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introductio...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsai, Yu-Shuen, Lin, Chin-Teng, Tseng, George C, Chung, I-Fang, Pal, Nikhil Ranjan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2620271/
https://www.ncbi.nlm.nih.gov/pubmed/18842155
http://dx.doi.org/10.1186/1471-2105-9-425
_version_ 1782163367262158848
author Tsai, Yu-Shuen
Lin, Chin-Teng
Tseng, George C
Chung, I-Fang
Pal, Nikhil Ranjan
author_facet Tsai, Yu-Shuen
Lin, Chin-Teng
Tseng, George C
Chung, I-Fang
Pal, Nikhil Ranjan
author_sort Tsai, Yu-Shuen
collection PubMed
description BACKGROUND: The Signal-to-Noise-Ratio (SNR) is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs). These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems. RESULTS AND DISCUSSION: To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer). For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC), Nearest Mean Classifier (NMC), Support Vector Machine (SVM) classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA) and one-vs-one (OVO) strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the same computational protocols. The dominant genes are usually easy to find while good dormant genes may not always be available as dormant genes require stronger constraints to be satisfied; but when they are available, they can be used for authentication of diagnosis. CONCLUSION: Since GDI based schemes can find a small set of dominant/dormant biomarkers that is adequate to design diagnostic prediction systems, it opens up the possibility of using real-time qPCR assays or antibody based methods such as ELISA for an easy and low cost diagnosis of diseases. The dominant and dormant genes found by GDIs can be used in different ways to design more reliable diagnostic prediction systems.
format Text
id pubmed-2620271
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26202712009-01-13 Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems Tsai, Yu-Shuen Lin, Chin-Teng Tseng, George C Chung, I-Fang Pal, Nikhil Ranjan BMC Bioinformatics Research Article BACKGROUND: The Signal-to-Noise-Ratio (SNR) is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs). These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems. RESULTS AND DISCUSSION: To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer). For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC), Nearest Mean Classifier (NMC), Support Vector Machine (SVM) classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA) and one-vs-one (OVO) strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the same computational protocols. The dominant genes are usually easy to find while good dormant genes may not always be available as dormant genes require stronger constraints to be satisfied; but when they are available, they can be used for authentication of diagnosis. CONCLUSION: Since GDI based schemes can find a small set of dominant/dormant biomarkers that is adequate to design diagnostic prediction systems, it opens up the possibility of using real-time qPCR assays or antibody based methods such as ELISA for an easy and low cost diagnosis of diseases. The dominant and dormant genes found by GDIs can be used in different ways to design more reliable diagnostic prediction systems. BioMed Central 2008-10-09 /pmc/articles/PMC2620271/ /pubmed/18842155 http://dx.doi.org/10.1186/1471-2105-9-425 Text en Copyright © 2008 Tsai et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Tsai, Yu-Shuen
Lin, Chin-Teng
Tseng, George C
Chung, I-Fang
Pal, Nikhil Ranjan
Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems
title Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems
title_full Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems
title_fullStr Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems
title_full_unstemmed Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems
title_short Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems
title_sort discovery of dominant and dormant genes from expression data using a novel generalization of snr for multi-class problems
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2620271/
https://www.ncbi.nlm.nih.gov/pubmed/18842155
http://dx.doi.org/10.1186/1471-2105-9-425
work_keys_str_mv AT tsaiyushuen discoveryofdominantanddormantgenesfromexpressiondatausinganovelgeneralizationofsnrformulticlassproblems
AT linchinteng discoveryofdominantanddormantgenesfromexpressiondatausinganovelgeneralizationofsnrformulticlassproblems
AT tsenggeorgec discoveryofdominantanddormantgenesfromexpressiondatausinganovelgeneralizationofsnrformulticlassproblems
AT chungifang discoveryofdominantanddormantgenesfromexpressiondatausinganovelgeneralizationofsnrformulticlassproblems
AT palnikhilranjan discoveryofdominantanddormantgenesfromexpressiondatausinganovelgeneralizationofsnrformulticlassproblems