Cargando…

Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment

BACKGROUND: In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systemat...

Descripción completa

Detalles Bibliográficos
Autores principales: Hornung, Roman, Boulesteix, Anne-Laure, Causeur, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710051/
https://www.ncbi.nlm.nih.gov/pubmed/26753519
http://dx.doi.org/10.1186/s12859-015-0870-z
_version_ 1782409769410101248
author Hornung, Roman
Boulesteix, Anne-Laure
Causeur, David
author_facet Hornung, Roman
Boulesteix, Anne-Laure
Causeur, David
author_sort Hornung, Roman
collection PubMed
description BACKGROUND: In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systematic differences between these batches not attributable to the biological signal of interest are denoted as batch effects. If ignored when conducting analyses on the combined data, batch effects can lead to distortions in the results. In this paper we present FAbatch, a general, model-based method for correcting for such batch effects in the case of an analysis involving a binary target variable. It is a combination of two commonly used approaches: location-and-scale adjustment and data cleaning by adjustment for distortions due to latent factors. We compare FAbatch extensively to the most commonly applied competitors on the basis of several performance metrics. FAbatch can also be used in the context of prediction modelling to eliminate batch effects from new test data. This important application is illustrated using real and simulated data. We implemented FAbatch and various other functionalities in the R package bapred available online from CRAN. RESULTS: FAbatch is seen to be competitive in many cases and above average in others. In our analyses, the only cases where it failed to adequately preserve the biological signal were when there were extremely outlying batches and when the batch effects were very weak compared to the biological signal. CONCLUSIONS: As seen in this paper batch effect structures found in real datasets are diverse. Current batch effect adjustment methods are often either too simplistic or make restrictive assumptions, which can be violated in real datasets. Due to the generality of its underlying model and its ability to perform well FAbatch represents a reliable tool for batch effect adjustment for most situations found in practice. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0870-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4710051
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47100512016-01-13 Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment Hornung, Roman Boulesteix, Anne-Laure Causeur, David BMC Bioinformatics Methodology Article BACKGROUND: In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systematic differences between these batches not attributable to the biological signal of interest are denoted as batch effects. If ignored when conducting analyses on the combined data, batch effects can lead to distortions in the results. In this paper we present FAbatch, a general, model-based method for correcting for such batch effects in the case of an analysis involving a binary target variable. It is a combination of two commonly used approaches: location-and-scale adjustment and data cleaning by adjustment for distortions due to latent factors. We compare FAbatch extensively to the most commonly applied competitors on the basis of several performance metrics. FAbatch can also be used in the context of prediction modelling to eliminate batch effects from new test data. This important application is illustrated using real and simulated data. We implemented FAbatch and various other functionalities in the R package bapred available online from CRAN. RESULTS: FAbatch is seen to be competitive in many cases and above average in others. In our analyses, the only cases where it failed to adequately preserve the biological signal were when there were extremely outlying batches and when the batch effects were very weak compared to the biological signal. CONCLUSIONS: As seen in this paper batch effect structures found in real datasets are diverse. Current batch effect adjustment methods are often either too simplistic or make restrictive assumptions, which can be violated in real datasets. Due to the generality of its underlying model and its ability to perform well FAbatch represents a reliable tool for batch effect adjustment for most situations found in practice. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0870-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-12 /pmc/articles/PMC4710051/ /pubmed/26753519 http://dx.doi.org/10.1186/s12859-015-0870-z Text en © Hornung et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Hornung, Roman
Boulesteix, Anne-Laure
Causeur, David
Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment
title Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment
title_full Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment
title_fullStr Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment
title_full_unstemmed Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment
title_short Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment
title_sort combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710051/
https://www.ncbi.nlm.nih.gov/pubmed/26753519
http://dx.doi.org/10.1186/s12859-015-0870-z
work_keys_str_mv AT hornungroman combininglocationandscalebatcheffectadjustmentwithdatacleaningbylatentfactoradjustment
AT boulesteixannelaure combininglocationandscalebatcheffectadjustmentwithdatacleaningbylatentfactoradjustment
AT causeurdavid combininglocationandscalebatcheffectadjustmentwithdatacleaningbylatentfactoradjustment