Cargando…

Influence of batch effect correction methods on drug induced differential gene expression profiles

BACKGROUND: Batch effects were not accounted for in most of the studies of computational drug repositioning based on gene expression signatures. It is unknown how batch effect removal methods impact the results of signature-based drug repositioning. Herein, we conducted differential analyses on the...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Wei, Koudijs, Karel K. M., Böhringer, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6706913/
https://www.ncbi.nlm.nih.gov/pubmed/31438848
http://dx.doi.org/10.1186/s12859-019-3028-6
_version_ 1783445776355033088
author Zhou, Wei
Koudijs, Karel K. M.
Böhringer, Stefan
author_facet Zhou, Wei
Koudijs, Karel K. M.
Böhringer, Stefan
author_sort Zhou, Wei
collection PubMed
description BACKGROUND: Batch effects were not accounted for in most of the studies of computational drug repositioning based on gene expression signatures. It is unknown how batch effect removal methods impact the results of signature-based drug repositioning. Herein, we conducted differential analyses on the Connectivity Map (CMAP) database using several batch effect correction methods to evaluate the influence of batch effect correction methods on computational drug repositioning using microarray data and compare several batch effect correction methods. RESULTS: Differences in average signature size were observed with different methods applied. The gene signatures identified by the Latent Effect Adjustment after Primary Projection (LEAPP) method and the methods fitted with Linear Models for Microarray Data (limma) software demonstrated little agreement. The external validity of the gene signatures was evaluated by connectivity mapping between the CMAP database and the Library of Integrated Network-based Cellular Signatures (LINCS) database. The results of connectivity mapping indicate that the genes identified were not reliable for drugs with total sample size (drug + control samples) smaller than 40, irrespective of the batch effect correction method applied. With total sample size larger than 40, the methods correcting for batch effects produced significantly better results than the method with no batch effect correction. In a simulation study, the power was generally low for simulated data with sample size smaller than 40. We observed best performance when using the limma method correcting for two principal components. CONCLUSION: Batch effect correction methods strongly impact differential gene expression analysis when the sample size is large enough to contain sufficient information and thus the downstream drug repositioning. We recommend including two or three principal components as covariates in fitting models with limma when sample size is sufficient (larger than 40 drug and controls combined). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3028-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6706913
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67069132019-08-28 Influence of batch effect correction methods on drug induced differential gene expression profiles Zhou, Wei Koudijs, Karel K. M. Böhringer, Stefan BMC Bioinformatics Methodology Article BACKGROUND: Batch effects were not accounted for in most of the studies of computational drug repositioning based on gene expression signatures. It is unknown how batch effect removal methods impact the results of signature-based drug repositioning. Herein, we conducted differential analyses on the Connectivity Map (CMAP) database using several batch effect correction methods to evaluate the influence of batch effect correction methods on computational drug repositioning using microarray data and compare several batch effect correction methods. RESULTS: Differences in average signature size were observed with different methods applied. The gene signatures identified by the Latent Effect Adjustment after Primary Projection (LEAPP) method and the methods fitted with Linear Models for Microarray Data (limma) software demonstrated little agreement. The external validity of the gene signatures was evaluated by connectivity mapping between the CMAP database and the Library of Integrated Network-based Cellular Signatures (LINCS) database. The results of connectivity mapping indicate that the genes identified were not reliable for drugs with total sample size (drug + control samples) smaller than 40, irrespective of the batch effect correction method applied. With total sample size larger than 40, the methods correcting for batch effects produced significantly better results than the method with no batch effect correction. In a simulation study, the power was generally low for simulated data with sample size smaller than 40. We observed best performance when using the limma method correcting for two principal components. CONCLUSION: Batch effect correction methods strongly impact differential gene expression analysis when the sample size is large enough to contain sufficient information and thus the downstream drug repositioning. We recommend including two or three principal components as covariates in fitting models with limma when sample size is sufficient (larger than 40 drug and controls combined). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3028-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-22 /pmc/articles/PMC6706913/ /pubmed/31438848 http://dx.doi.org/10.1186/s12859-019-3028-6 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Zhou, Wei
Koudijs, Karel K. M.
Böhringer, Stefan
Influence of batch effect correction methods on drug induced differential gene expression profiles
title Influence of batch effect correction methods on drug induced differential gene expression profiles
title_full Influence of batch effect correction methods on drug induced differential gene expression profiles
title_fullStr Influence of batch effect correction methods on drug induced differential gene expression profiles
title_full_unstemmed Influence of batch effect correction methods on drug induced differential gene expression profiles
title_short Influence of batch effect correction methods on drug induced differential gene expression profiles
title_sort influence of batch effect correction methods on drug induced differential gene expression profiles
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6706913/
https://www.ncbi.nlm.nih.gov/pubmed/31438848
http://dx.doi.org/10.1186/s12859-019-3028-6
work_keys_str_mv AT zhouwei influenceofbatcheffectcorrectionmethodsondruginduceddifferentialgeneexpressionprofiles
AT koudijskarelkm influenceofbatcheffectcorrectionmethodsondruginduceddifferentialgeneexpressionprofiles
AT bohringerstefan influenceofbatcheffectcorrectionmethodsondruginduceddifferentialgeneexpressionprofiles