Cargando…

Semi-supervised discovery of differential genes

BACKGROUND: Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily be...

Descripción completa

Detalles Bibliográficos
Autores principales: Oba, Shigeyuki, lshii, Shin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1584253/
https://www.ncbi.nlm.nih.gov/pubmed/16981994
http://dx.doi.org/10.1186/1471-2105-7-414
_version_ 1782130331792441344
author Oba, Shigeyuki
lshii, Shin
author_facet Oba, Shigeyuki
lshii, Shin
author_sort Oba, Shigeyuki
collection PubMed
description BACKGROUND: Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels. RESULTS: We assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems. CONCLUSION: The experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests.
format Text
id pubmed-1584253
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15842532006-10-02 Semi-supervised discovery of differential genes Oba, Shigeyuki lshii, Shin BMC Bioinformatics Methodology Article BACKGROUND: Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels. RESULTS: We assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems. CONCLUSION: The experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests. BioMed Central 2006-09-18 /pmc/articles/PMC1584253/ /pubmed/16981994 http://dx.doi.org/10.1186/1471-2105-7-414 Text en Copyright © 2006 Oba and lshii; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Oba, Shigeyuki
lshii, Shin
Semi-supervised discovery of differential genes
title Semi-supervised discovery of differential genes
title_full Semi-supervised discovery of differential genes
title_fullStr Semi-supervised discovery of differential genes
title_full_unstemmed Semi-supervised discovery of differential genes
title_short Semi-supervised discovery of differential genes
title_sort semi-supervised discovery of differential genes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1584253/
https://www.ncbi.nlm.nih.gov/pubmed/16981994
http://dx.doi.org/10.1186/1471-2105-7-414
work_keys_str_mv AT obashigeyuki semisuperviseddiscoveryofdifferentialgenes
AT lshiishin semisuperviseddiscoveryofdifferentialgenes