Cargando…

A feature selection approach for identification of signature genes from SAGE data

BACKGROUND: One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic mod...

Descripción completa

Detalles Bibliográficos
Autores principales: Barrera, Junior, Cesar, Roberto M, Humes, Carlos, Martins, David C, Patrão, Diogo FC, Silva, Paulo JS, Brentani, Helena
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1891113/
https://www.ncbi.nlm.nih.gov/pubmed/17519038
http://dx.doi.org/10.1186/1471-2105-8-169
_version_ 1782133733831213056
author Barrera, Junior
Cesar, Roberto M
Humes, Carlos
Martins, David C
Patrão, Diogo FC
Silva, Paulo JS
Brentani, Helena
author_facet Barrera, Junior
Cesar, Roberto M
Humes, Carlos
Martins, David C
Patrão, Diogo FC
Silva, Paulo JS
Brentani, Helena
author_sort Barrera, Junior
collection PubMed
description BACKGROUND: One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. RESULTS: A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. CONCLUSION: The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.
format Text
id pubmed-1891113
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18911132007-06-13 A feature selection approach for identification of signature genes from SAGE data Barrera, Junior Cesar, Roberto M Humes, Carlos Martins, David C Patrão, Diogo FC Silva, Paulo JS Brentani, Helena BMC Bioinformatics Research Article BACKGROUND: One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. RESULTS: A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. CONCLUSION: The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers. BioMed Central 2007-05-22 /pmc/articles/PMC1891113/ /pubmed/17519038 http://dx.doi.org/10.1186/1471-2105-8-169 Text en Copyright © 2007 Barrera et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Barrera, Junior
Cesar, Roberto M
Humes, Carlos
Martins, David C
Patrão, Diogo FC
Silva, Paulo JS
Brentani, Helena
A feature selection approach for identification of signature genes from SAGE data
title A feature selection approach for identification of signature genes from SAGE data
title_full A feature selection approach for identification of signature genes from SAGE data
title_fullStr A feature selection approach for identification of signature genes from SAGE data
title_full_unstemmed A feature selection approach for identification of signature genes from SAGE data
title_short A feature selection approach for identification of signature genes from SAGE data
title_sort feature selection approach for identification of signature genes from sage data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1891113/
https://www.ncbi.nlm.nih.gov/pubmed/17519038
http://dx.doi.org/10.1186/1471-2105-8-169
work_keys_str_mv AT barrerajunior afeatureselectionapproachforidentificationofsignaturegenesfromsagedata
AT cesarrobertom afeatureselectionapproachforidentificationofsignaturegenesfromsagedata
AT humescarlos afeatureselectionapproachforidentificationofsignaturegenesfromsagedata
AT martinsdavidc afeatureselectionapproachforidentificationofsignaturegenesfromsagedata
AT patraodiogofc afeatureselectionapproachforidentificationofsignaturegenesfromsagedata
AT silvapaulojs afeatureselectionapproachforidentificationofsignaturegenesfromsagedata
AT brentanihelena afeatureselectionapproachforidentificationofsignaturegenesfromsagedata
AT barrerajunior featureselectionapproachforidentificationofsignaturegenesfromsagedata
AT cesarrobertom featureselectionapproachforidentificationofsignaturegenesfromsagedata
AT humescarlos featureselectionapproachforidentificationofsignaturegenesfromsagedata
AT martinsdavidc featureselectionapproachforidentificationofsignaturegenesfromsagedata
AT patraodiogofc featureselectionapproachforidentificationofsignaturegenesfromsagedata
AT silvapaulojs featureselectionapproachforidentificationofsignaturegenesfromsagedata
AT brentanihelena featureselectionapproachforidentificationofsignaturegenesfromsagedata