Cargando…

Differential gene expression in disease: a comparison between high-throughput studies and the literature

BACKGROUND: Differential gene expression is important to understand the biological differences between healthy and diseased states. Two common sources of differential gene expression data are microarray studies and the biomedical literature. METHODS: With the aid of text mining and gene expression a...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodriguez-Esteban, Raul, Jiang, Xiaoyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5637346/
https://www.ncbi.nlm.nih.gov/pubmed/29020950
http://dx.doi.org/10.1186/s12920-017-0293-y
_version_ 1783270607516860416
author Rodriguez-Esteban, Raul
Jiang, Xiaoyu
author_facet Rodriguez-Esteban, Raul
Jiang, Xiaoyu
author_sort Rodriguez-Esteban, Raul
collection PubMed
description BACKGROUND: Differential gene expression is important to understand the biological differences between healthy and diseased states. Two common sources of differential gene expression data are microarray studies and the biomedical literature. METHODS: With the aid of text mining and gene expression analysis we have examined the comparative properties of these two sources of differential gene expression data. RESULTS: The literature shows a preference for reporting genes associated to higher fold changes in microarray data, rather than genes that are simply significantly differentially expressed. Thus, the resemblance between the literature and microarray data increases when the fold-change threshold for microarray data is increased. Moreover, the literature has a reporting preference for differentially expressed genes that (1) are overexpressed rather than underexpressed; (2) are overexpressed in multiple diseases; and (3) are popular in the biomedical literature at large. Additionally, the degree to which diseases are similar depends on whether microarray data or the literature is used to compare them. Finally, vaguely-qualified reports of differential expression magnitudes in the literature have only small correlation with microarray fold-change data. CONCLUSIONS: Reporting biases of differential gene expression in the literature can be affecting our appreciation of disease biology and of the degree of similarity that actually exists between different diseases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-017-0293-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5637346
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56373462017-10-18 Differential gene expression in disease: a comparison between high-throughput studies and the literature Rodriguez-Esteban, Raul Jiang, Xiaoyu BMC Med Genomics Research Article BACKGROUND: Differential gene expression is important to understand the biological differences between healthy and diseased states. Two common sources of differential gene expression data are microarray studies and the biomedical literature. METHODS: With the aid of text mining and gene expression analysis we have examined the comparative properties of these two sources of differential gene expression data. RESULTS: The literature shows a preference for reporting genes associated to higher fold changes in microarray data, rather than genes that are simply significantly differentially expressed. Thus, the resemblance between the literature and microarray data increases when the fold-change threshold for microarray data is increased. Moreover, the literature has a reporting preference for differentially expressed genes that (1) are overexpressed rather than underexpressed; (2) are overexpressed in multiple diseases; and (3) are popular in the biomedical literature at large. Additionally, the degree to which diseases are similar depends on whether microarray data or the literature is used to compare them. Finally, vaguely-qualified reports of differential expression magnitudes in the literature have only small correlation with microarray fold-change data. CONCLUSIONS: Reporting biases of differential gene expression in the literature can be affecting our appreciation of disease biology and of the degree of similarity that actually exists between different diseases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-017-0293-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-10-11 /pmc/articles/PMC5637346/ /pubmed/29020950 http://dx.doi.org/10.1186/s12920-017-0293-y Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Rodriguez-Esteban, Raul
Jiang, Xiaoyu
Differential gene expression in disease: a comparison between high-throughput studies and the literature
title Differential gene expression in disease: a comparison between high-throughput studies and the literature
title_full Differential gene expression in disease: a comparison between high-throughput studies and the literature
title_fullStr Differential gene expression in disease: a comparison between high-throughput studies and the literature
title_full_unstemmed Differential gene expression in disease: a comparison between high-throughput studies and the literature
title_short Differential gene expression in disease: a comparison between high-throughput studies and the literature
title_sort differential gene expression in disease: a comparison between high-throughput studies and the literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5637346/
https://www.ncbi.nlm.nih.gov/pubmed/29020950
http://dx.doi.org/10.1186/s12920-017-0293-y
work_keys_str_mv AT rodriguezestebanraul differentialgeneexpressionindiseaseacomparisonbetweenhighthroughputstudiesandtheliterature
AT jiangxiaoyu differentialgeneexpressionindiseaseacomparisonbetweenhighthroughputstudiesandtheliterature