Cargando…
Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment
BACKGROUND: In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systemat...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710051/ https://www.ncbi.nlm.nih.gov/pubmed/26753519 http://dx.doi.org/10.1186/s12859-015-0870-z |
_version_ | 1782409769410101248 |
---|---|
author | Hornung, Roman Boulesteix, Anne-Laure Causeur, David |
author_facet | Hornung, Roman Boulesteix, Anne-Laure Causeur, David |
author_sort | Hornung, Roman |
collection | PubMed |
description | BACKGROUND: In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systematic differences between these batches not attributable to the biological signal of interest are denoted as batch effects. If ignored when conducting analyses on the combined data, batch effects can lead to distortions in the results. In this paper we present FAbatch, a general, model-based method for correcting for such batch effects in the case of an analysis involving a binary target variable. It is a combination of two commonly used approaches: location-and-scale adjustment and data cleaning by adjustment for distortions due to latent factors. We compare FAbatch extensively to the most commonly applied competitors on the basis of several performance metrics. FAbatch can also be used in the context of prediction modelling to eliminate batch effects from new test data. This important application is illustrated using real and simulated data. We implemented FAbatch and various other functionalities in the R package bapred available online from CRAN. RESULTS: FAbatch is seen to be competitive in many cases and above average in others. In our analyses, the only cases where it failed to adequately preserve the biological signal were when there were extremely outlying batches and when the batch effects were very weak compared to the biological signal. CONCLUSIONS: As seen in this paper batch effect structures found in real datasets are diverse. Current batch effect adjustment methods are often either too simplistic or make restrictive assumptions, which can be violated in real datasets. Due to the generality of its underlying model and its ability to perform well FAbatch represents a reliable tool for batch effect adjustment for most situations found in practice. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0870-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4710051 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47100512016-01-13 Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment Hornung, Roman Boulesteix, Anne-Laure Causeur, David BMC Bioinformatics Methodology Article BACKGROUND: In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systematic differences between these batches not attributable to the biological signal of interest are denoted as batch effects. If ignored when conducting analyses on the combined data, batch effects can lead to distortions in the results. In this paper we present FAbatch, a general, model-based method for correcting for such batch effects in the case of an analysis involving a binary target variable. It is a combination of two commonly used approaches: location-and-scale adjustment and data cleaning by adjustment for distortions due to latent factors. We compare FAbatch extensively to the most commonly applied competitors on the basis of several performance metrics. FAbatch can also be used in the context of prediction modelling to eliminate batch effects from new test data. This important application is illustrated using real and simulated data. We implemented FAbatch and various other functionalities in the R package bapred available online from CRAN. RESULTS: FAbatch is seen to be competitive in many cases and above average in others. In our analyses, the only cases where it failed to adequately preserve the biological signal were when there were extremely outlying batches and when the batch effects were very weak compared to the biological signal. CONCLUSIONS: As seen in this paper batch effect structures found in real datasets are diverse. Current batch effect adjustment methods are often either too simplistic or make restrictive assumptions, which can be violated in real datasets. Due to the generality of its underlying model and its ability to perform well FAbatch represents a reliable tool for batch effect adjustment for most situations found in practice. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0870-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-12 /pmc/articles/PMC4710051/ /pubmed/26753519 http://dx.doi.org/10.1186/s12859-015-0870-z Text en © Hornung et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Hornung, Roman Boulesteix, Anne-Laure Causeur, David Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment |
title | Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment |
title_full | Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment |
title_fullStr | Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment |
title_full_unstemmed | Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment |
title_short | Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment |
title_sort | combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710051/ https://www.ncbi.nlm.nih.gov/pubmed/26753519 http://dx.doi.org/10.1186/s12859-015-0870-z |
work_keys_str_mv | AT hornungroman combininglocationandscalebatcheffectadjustmentwithdatacleaningbylatentfactoradjustment AT boulesteixannelaure combininglocationandscalebatcheffectadjustmentwithdatacleaningbylatentfactoradjustment AT causeurdavid combininglocationandscalebatcheffectadjustmentwithdatacleaningbylatentfactoradjustment |