Cargando…

Reproducible detection of disease-associated markers from gene expression data

BACKGROUND: Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among dif...

Descripción completa

Detalles Bibliográficos
Autores principales: Omae, Katsuhiro, Komori, Osamu, Eguchi, Shinto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4991096/
https://www.ncbi.nlm.nih.gov/pubmed/27538512
http://dx.doi.org/10.1186/s12920-016-0214-5
_version_ 1782448797526261760
author Omae, Katsuhiro
Komori, Osamu
Eguchi, Shinto
author_facet Omae, Katsuhiro
Komori, Osamu
Eguchi, Shinto
author_sort Omae, Katsuhiro
collection PubMed
description BACKGROUND: Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. RESULTS: When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. CONCLUSION: We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-016-0214-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4991096
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49910962016-08-20 Reproducible detection of disease-associated markers from gene expression data Omae, Katsuhiro Komori, Osamu Eguchi, Shinto BMC Med Genomics Research Article BACKGROUND: Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. RESULTS: When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. CONCLUSION: We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-016-0214-5) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-18 /pmc/articles/PMC4991096/ /pubmed/27538512 http://dx.doi.org/10.1186/s12920-016-0214-5 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Omae, Katsuhiro
Komori, Osamu
Eguchi, Shinto
Reproducible detection of disease-associated markers from gene expression data
title Reproducible detection of disease-associated markers from gene expression data
title_full Reproducible detection of disease-associated markers from gene expression data
title_fullStr Reproducible detection of disease-associated markers from gene expression data
title_full_unstemmed Reproducible detection of disease-associated markers from gene expression data
title_short Reproducible detection of disease-associated markers from gene expression data
title_sort reproducible detection of disease-associated markers from gene expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4991096/
https://www.ncbi.nlm.nih.gov/pubmed/27538512
http://dx.doi.org/10.1186/s12920-016-0214-5
work_keys_str_mv AT omaekatsuhiro reproducibledetectionofdiseaseassociatedmarkersfromgeneexpressiondata
AT komoriosamu reproducibledetectionofdiseaseassociatedmarkersfromgeneexpressiondata
AT eguchishinto reproducibledetectionofdiseaseassociatedmarkersfromgeneexpressiondata