Cargando…
GFS: fuzzy preprocessing for effective gene expression analysis
BACKGROUND: Gene expression data produced on high-throughput platforms such as microarrays is susceptible to much variation that obscures useful biological information. Therefore, preprocessing data with a suitable normalization method is necessary, and has a direct and massive impact on the quality...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260137/ https://www.ncbi.nlm.nih.gov/pubmed/28155629 http://dx.doi.org/10.1186/s12859-016-1327-8 |
_version_ | 1782499351430430720 |
---|---|
author | Belorkar, Abha Wong, Limsoon |
author_facet | Belorkar, Abha Wong, Limsoon |
author_sort | Belorkar, Abha |
collection | PubMed |
description | BACKGROUND: Gene expression data produced on high-throughput platforms such as microarrays is susceptible to much variation that obscures useful biological information. Therefore, preprocessing data with a suitable normalization method is necessary, and has a direct and massive impact on the quality of downstream data analysis. However, it is known that standard normalization methods perform poorly, specially in the presence of substantial batch effects and heterogeneity in gene expression data. RESULTS: We present Gene Fuzzy Score (GFS), a simple preprocessing technique, that is able to largely reduce obscuring variation while retaining useful biological information. Using four sets of publicly available datasets containing batch effects and heterogeneity, we compare GFS with three standard normalization techniques as well as raw gene expression. Each method is evaluated with respect to the quality, consistency, and biological coherence of its processed output. It is found that GFS outperforms other transformation techniques in all three aspects. CONCLUSION: Our approach to preprocessing is a stronger alternative to popular normalization techniques. We demonstrate that it achieves the essential goal of preprocessing – it is effective at making expression values from multiple samples comparable, even when they are from separate platforms, in independent batches, or belong to a heterogeneous phenotype. |
format | Online Article Text |
id | pubmed-5260137 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52601372017-01-30 GFS: fuzzy preprocessing for effective gene expression analysis Belorkar, Abha Wong, Limsoon BMC Bioinformatics Research BACKGROUND: Gene expression data produced on high-throughput platforms such as microarrays is susceptible to much variation that obscures useful biological information. Therefore, preprocessing data with a suitable normalization method is necessary, and has a direct and massive impact on the quality of downstream data analysis. However, it is known that standard normalization methods perform poorly, specially in the presence of substantial batch effects and heterogeneity in gene expression data. RESULTS: We present Gene Fuzzy Score (GFS), a simple preprocessing technique, that is able to largely reduce obscuring variation while retaining useful biological information. Using four sets of publicly available datasets containing batch effects and heterogeneity, we compare GFS with three standard normalization techniques as well as raw gene expression. Each method is evaluated with respect to the quality, consistency, and biological coherence of its processed output. It is found that GFS outperforms other transformation techniques in all three aspects. CONCLUSION: Our approach to preprocessing is a stronger alternative to popular normalization techniques. We demonstrate that it achieves the essential goal of preprocessing – it is effective at making expression values from multiple samples comparable, even when they are from separate platforms, in independent batches, or belong to a heterogeneous phenotype. BioMed Central 2016-12-23 /pmc/articles/PMC5260137/ /pubmed/28155629 http://dx.doi.org/10.1186/s12859-016-1327-8 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Belorkar, Abha Wong, Limsoon GFS: fuzzy preprocessing for effective gene expression analysis |
title | GFS: fuzzy preprocessing for effective gene expression analysis |
title_full | GFS: fuzzy preprocessing for effective gene expression analysis |
title_fullStr | GFS: fuzzy preprocessing for effective gene expression analysis |
title_full_unstemmed | GFS: fuzzy preprocessing for effective gene expression analysis |
title_short | GFS: fuzzy preprocessing for effective gene expression analysis |
title_sort | gfs: fuzzy preprocessing for effective gene expression analysis |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260137/ https://www.ncbi.nlm.nih.gov/pubmed/28155629 http://dx.doi.org/10.1186/s12859-016-1327-8 |
work_keys_str_mv | AT belorkarabha gfsfuzzypreprocessingforeffectivegeneexpressionanalysis AT wonglimsoon gfsfuzzypreprocessingforeffectivegeneexpressionanalysis |