Cargando…

Empirical study of supervised gene screening

BACKGROUND: Microarray studies provide a way of linking variations of phenotypes with their genetic causations. Constructing predictive models using high dimensional microarray measurements usually consists of three steps: (1) unsupervised gene screening; (2) supervised gene screening; and (3) stati...

Descripción completa

Detalles Bibliográficos
Autor principal:	Ma, Shuangge
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1764766/ https://www.ncbi.nlm.nih.gov/pubmed/17176468 http://dx.doi.org/10.1186/1471-2105-7-537

_version_	1782131641778438144
author	Ma, Shuangge
author_facet	Ma, Shuangge
author_sort	Ma, Shuangge
collection	PubMed
description	BACKGROUND: Microarray studies provide a way of linking variations of phenotypes with their genetic causations. Constructing predictive models using high dimensional microarray measurements usually consists of three steps: (1) unsupervised gene screening; (2) supervised gene screening; and (3) statistical model building. Supervised gene screening based on marginal gene ranking is commonly used to reduce the number of genes in the model building. Various simple statistics, such as t-statistic or signal to noise ratio, have been used to rank genes in the supervised screening. Despite of its extensive usage, statistical study of supervised gene screening remains scarce. Our study is partly motivated by the differences in gene discovery results caused by using different supervised gene screening methods. RESULTS: We investigate concordance and reproducibility of supervised gene screening based on eight commonly used marginal statistics. Concordance is assessed by the relative fractions of overlaps between top ranked genes screened using different marginal statistics. We propose a Bootstrap Reproducibility Index, which measures reproducibility of individual genes under the supervised screening. Empirical studies are based on four public microarray data. We consider the cases where the top 20%, 40% and 60% genes are screened. CONCLUSION: From a gene discovery point of view, the effect of supervised gene screening based on different marginal statistics cannot be ignored. Empirical studies show that (1) genes passed different supervised screenings may be considerably different; (2) concordance may vary, depending on the underlying data structure and percentage of selected genes; (3) evaluated with the Bootstrap Reproducibility Index, genes passed supervised screenings are only moderately reproducible; and (4) concordance cannot be improved by supervised screening based on reproducibility.
format	Text
id	pubmed-1764766
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-17647662007-01-10 Empirical study of supervised gene screening Ma, Shuangge BMC Bioinformatics Methodology Article BACKGROUND: Microarray studies provide a way of linking variations of phenotypes with their genetic causations. Constructing predictive models using high dimensional microarray measurements usually consists of three steps: (1) unsupervised gene screening; (2) supervised gene screening; and (3) statistical model building. Supervised gene screening based on marginal gene ranking is commonly used to reduce the number of genes in the model building. Various simple statistics, such as t-statistic or signal to noise ratio, have been used to rank genes in the supervised screening. Despite of its extensive usage, statistical study of supervised gene screening remains scarce. Our study is partly motivated by the differences in gene discovery results caused by using different supervised gene screening methods. RESULTS: We investigate concordance and reproducibility of supervised gene screening based on eight commonly used marginal statistics. Concordance is assessed by the relative fractions of overlaps between top ranked genes screened using different marginal statistics. We propose a Bootstrap Reproducibility Index, which measures reproducibility of individual genes under the supervised screening. Empirical studies are based on four public microarray data. We consider the cases where the top 20%, 40% and 60% genes are screened. CONCLUSION: From a gene discovery point of view, the effect of supervised gene screening based on different marginal statistics cannot be ignored. Empirical studies show that (1) genes passed different supervised screenings may be considerably different; (2) concordance may vary, depending on the underlying data structure and percentage of selected genes; (3) evaluated with the Bootstrap Reproducibility Index, genes passed supervised screenings are only moderately reproducible; and (4) concordance cannot be improved by supervised screening based on reproducibility. BioMed Central 2006-12-18 /pmc/articles/PMC1764766/ /pubmed/17176468 http://dx.doi.org/10.1186/1471-2105-7-537 Text en Copyright © 2006 Ma; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Ma, Shuangge Empirical study of supervised gene screening
title	Empirical study of supervised gene screening
title_full	Empirical study of supervised gene screening
title_fullStr	Empirical study of supervised gene screening
title_full_unstemmed	Empirical study of supervised gene screening
title_short	Empirical study of supervised gene screening
title_sort	empirical study of supervised gene screening
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1764766/ https://www.ncbi.nlm.nih.gov/pubmed/17176468 http://dx.doi.org/10.1186/1471-2105-7-537
work_keys_str_mv	AT mashuangge empiricalstudyofsupervisedgenescreening

Empirical study of supervised gene screening

Ejemplares similares