Cargando…

GFS: fuzzy preprocessing for effective gene expression analysis

BACKGROUND: Gene expression data produced on high-throughput platforms such as microarrays is susceptible to much variation that obscures useful biological information. Therefore, preprocessing data with a suitable normalization method is necessary, and has a direct and massive impact on the quality...

Descripción completa

Detalles Bibliográficos
Autores principales: Belorkar, Abha, Wong, Limsoon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260137/
https://www.ncbi.nlm.nih.gov/pubmed/28155629
http://dx.doi.org/10.1186/s12859-016-1327-8
_version_ 1782499351430430720
author Belorkar, Abha
Wong, Limsoon
author_facet Belorkar, Abha
Wong, Limsoon
author_sort Belorkar, Abha
collection PubMed
description BACKGROUND: Gene expression data produced on high-throughput platforms such as microarrays is susceptible to much variation that obscures useful biological information. Therefore, preprocessing data with a suitable normalization method is necessary, and has a direct and massive impact on the quality of downstream data analysis. However, it is known that standard normalization methods perform poorly, specially in the presence of substantial batch effects and heterogeneity in gene expression data. RESULTS: We present Gene Fuzzy Score (GFS), a simple preprocessing technique, that is able to largely reduce obscuring variation while retaining useful biological information. Using four sets of publicly available datasets containing batch effects and heterogeneity, we compare GFS with three standard normalization techniques as well as raw gene expression. Each method is evaluated with respect to the quality, consistency, and biological coherence of its processed output. It is found that GFS outperforms other transformation techniques in all three aspects. CONCLUSION: Our approach to preprocessing is a stronger alternative to popular normalization techniques. We demonstrate that it achieves the essential goal of preprocessing – it is effective at making expression values from multiple samples comparable, even when they are from separate platforms, in independent batches, or belong to a heterogeneous phenotype.
format Online
Article
Text
id pubmed-5260137
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52601372017-01-30 GFS: fuzzy preprocessing for effective gene expression analysis Belorkar, Abha Wong, Limsoon BMC Bioinformatics Research BACKGROUND: Gene expression data produced on high-throughput platforms such as microarrays is susceptible to much variation that obscures useful biological information. Therefore, preprocessing data with a suitable normalization method is necessary, and has a direct and massive impact on the quality of downstream data analysis. However, it is known that standard normalization methods perform poorly, specially in the presence of substantial batch effects and heterogeneity in gene expression data. RESULTS: We present Gene Fuzzy Score (GFS), a simple preprocessing technique, that is able to largely reduce obscuring variation while retaining useful biological information. Using four sets of publicly available datasets containing batch effects and heterogeneity, we compare GFS with three standard normalization techniques as well as raw gene expression. Each method is evaluated with respect to the quality, consistency, and biological coherence of its processed output. It is found that GFS outperforms other transformation techniques in all three aspects. CONCLUSION: Our approach to preprocessing is a stronger alternative to popular normalization techniques. We demonstrate that it achieves the essential goal of preprocessing – it is effective at making expression values from multiple samples comparable, even when they are from separate platforms, in independent batches, or belong to a heterogeneous phenotype. BioMed Central 2016-12-23 /pmc/articles/PMC5260137/ /pubmed/28155629 http://dx.doi.org/10.1186/s12859-016-1327-8 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Belorkar, Abha
Wong, Limsoon
GFS: fuzzy preprocessing for effective gene expression analysis
title GFS: fuzzy preprocessing for effective gene expression analysis
title_full GFS: fuzzy preprocessing for effective gene expression analysis
title_fullStr GFS: fuzzy preprocessing for effective gene expression analysis
title_full_unstemmed GFS: fuzzy preprocessing for effective gene expression analysis
title_short GFS: fuzzy preprocessing for effective gene expression analysis
title_sort gfs: fuzzy preprocessing for effective gene expression analysis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260137/
https://www.ncbi.nlm.nih.gov/pubmed/28155629
http://dx.doi.org/10.1186/s12859-016-1327-8
work_keys_str_mv AT belorkarabha gfsfuzzypreprocessingforeffectivegeneexpressionanalysis
AT wonglimsoon gfsfuzzypreprocessingforeffectivegeneexpressionanalysis