Cargando…

Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes

MOTIVATION: Medulloblastoma (MB) is a brain cancer predominantly arising in children. Roughly 70% of patients are cured today, but survivors often suffer from severe sequelae. MB has been extensively studied by molecular profiling, but often in small and scattered cohorts. To improve cure rates and...

Descripción completa

Detalles Bibliográficos
Autores principales: Weishaupt, Holger, Johansson, Patrik, Sundström, Anders, Lubovac-Pilav, Zelmina, Olsson, Björn, Nelander, Sven, Swartling, Fredrik J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6748729/
https://www.ncbi.nlm.nih.gov/pubmed/30715209
http://dx.doi.org/10.1093/bioinformatics/btz066
_version_ 1783452141230227456
author Weishaupt, Holger
Johansson, Patrik
Sundström, Anders
Lubovac-Pilav, Zelmina
Olsson, Björn
Nelander, Sven
Swartling, Fredrik J
author_facet Weishaupt, Holger
Johansson, Patrik
Sundström, Anders
Lubovac-Pilav, Zelmina
Olsson, Björn
Nelander, Sven
Swartling, Fredrik J
author_sort Weishaupt, Holger
collection PubMed
description MOTIVATION: Medulloblastoma (MB) is a brain cancer predominantly arising in children. Roughly 70% of patients are cured today, but survivors often suffer from severe sequelae. MB has been extensively studied by molecular profiling, but often in small and scattered cohorts. To improve cure rates and reduce treatment side effects, accurate integration of such data to increase analytical power will be important, if not essential. RESULTS: We have integrated 23 transcription datasets, spanning 1350 MB and 291 normal brain samples. To remove batch effects, we combined the Removal of Unwanted Variation (RUV) method with a novel pipeline for determining empirical negative control genes and a panel of metrics to evaluate normalization performance. The documented approach enabled the removal of a majority of batch effects, producing a large-scale, integrative dataset of MB and cerebellar expression data. The proposed strategy will be broadly applicable for accurate integration of data and incorporation of normal reference samples for studies of various diseases. We hope that the integrated dataset will improve current research in the field of MB by allowing more large-scale gene expression analyses. AVAILABILITY AND IMPLEMENTATION: The RUV-normalized expression data is available through the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) and can be accessed via the GSE series number GSE124814. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6748729
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67487292019-09-23 Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes Weishaupt, Holger Johansson, Patrik Sundström, Anders Lubovac-Pilav, Zelmina Olsson, Björn Nelander, Sven Swartling, Fredrik J Bioinformatics Original Papers MOTIVATION: Medulloblastoma (MB) is a brain cancer predominantly arising in children. Roughly 70% of patients are cured today, but survivors often suffer from severe sequelae. MB has been extensively studied by molecular profiling, but often in small and scattered cohorts. To improve cure rates and reduce treatment side effects, accurate integration of such data to increase analytical power will be important, if not essential. RESULTS: We have integrated 23 transcription datasets, spanning 1350 MB and 291 normal brain samples. To remove batch effects, we combined the Removal of Unwanted Variation (RUV) method with a novel pipeline for determining empirical negative control genes and a panel of metrics to evaluate normalization performance. The documented approach enabled the removal of a majority of batch effects, producing a large-scale, integrative dataset of MB and cerebellar expression data. The proposed strategy will be broadly applicable for accurate integration of data and incorporation of normal reference samples for studies of various diseases. We hope that the integrated dataset will improve current research in the field of MB by allowing more large-scale gene expression analyses. AVAILABILITY AND IMPLEMENTATION: The RUV-normalized expression data is available through the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) and can be accessed via the GSE series number GSE124814. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-09-15 2019-02-01 /pmc/articles/PMC6748729/ /pubmed/30715209 http://dx.doi.org/10.1093/bioinformatics/btz066 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Weishaupt, Holger
Johansson, Patrik
Sundström, Anders
Lubovac-Pilav, Zelmina
Olsson, Björn
Nelander, Sven
Swartling, Fredrik J
Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes
title Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes
title_full Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes
title_fullStr Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes
title_full_unstemmed Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes
title_short Batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes
title_sort batch-normalization of cerebellar and medulloblastoma gene expression datasets utilizing empirically defined negative control genes
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6748729/
https://www.ncbi.nlm.nih.gov/pubmed/30715209
http://dx.doi.org/10.1093/bioinformatics/btz066
work_keys_str_mv AT weishauptholger batchnormalizationofcerebellarandmedulloblastomageneexpressiondatasetsutilizingempiricallydefinednegativecontrolgenes
AT johanssonpatrik batchnormalizationofcerebellarandmedulloblastomageneexpressiondatasetsutilizingempiricallydefinednegativecontrolgenes
AT sundstromanders batchnormalizationofcerebellarandmedulloblastomageneexpressiondatasetsutilizingempiricallydefinednegativecontrolgenes
AT lubovacpilavzelmina batchnormalizationofcerebellarandmedulloblastomageneexpressiondatasetsutilizingempiricallydefinednegativecontrolgenes
AT olssonbjorn batchnormalizationofcerebellarandmedulloblastomageneexpressiondatasetsutilizingempiricallydefinednegativecontrolgenes
AT nelandersven batchnormalizationofcerebellarandmedulloblastomageneexpressiondatasetsutilizingempiricallydefinednegativecontrolgenes
AT swartlingfredrikj batchnormalizationofcerebellarandmedulloblastomageneexpressiondatasetsutilizingempiricallydefinednegativecontrolgenes