Cargando…

Blind estimation and correction of microarray batch effect

Microarray batch effect (BE) has been the primary bottleneck for large-scale integration of data from multiple experiments. Current BE correction methods either need known batch identities (ComBat) or have the potential to overcorrect, by removing true but unknown biological differences (Surrogate V...

Descripción completa

Detalles Bibliográficos
Autor principal: Varma, Sudhir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145015/
https://www.ncbi.nlm.nih.gov/pubmed/32271844
http://dx.doi.org/10.1371/journal.pone.0231446
_version_ 1783519926873489408
author Varma, Sudhir
author_facet Varma, Sudhir
author_sort Varma, Sudhir
collection PubMed
description Microarray batch effect (BE) has been the primary bottleneck for large-scale integration of data from multiple experiments. Current BE correction methods either need known batch identities (ComBat) or have the potential to overcorrect, by removing true but unknown biological differences (Surrogate Variable Analysis SVA). It is well known that experimental conditions such as array or reagent batches, PCR amplification or ozone levels can affect the measured expression levels; often the direction of perturbation of the measured expression is the same in different datasets. However, there are no BE correction algorithms that attempt to estimate the individual effects of technical differences and use them to correct expression data. In this manuscript, we show that a set of signatures, each of which is a vector the length of the number of probes, calculated on a reference set of microarray samples can predict much of the batch effect in other validation sets. We present a rationale of selecting a reference set of samples designed to estimate technical differences without removing biological differences. Putting both together, we introduce the Batch Effect Signature Correction (BESC) algorithm that uses the BES calculated on the reference set to efficiently predict and remove BE. Using two independent validation sets, we show that BESC is capable of removing batch effect without removing unknown but true biological differences. Much of the variations due to batch effect is shared between different microarray datasets. That shared information can be used to predict signatures (i.e. directions of perturbation) due to batch effect in new datasets. The correction can be precomputed without using the samples to be corrected (blind), done on each sample individually (single sample) and corrects only known technical effects without removing known or unknown biological differences (conservative). Those three characteristics make it ideal for high-throughput correction of samples for a microarray data repository. We also compare the performance of BESC to three other batch correction methods: SVA, Removing Unwanted Variation (RUV) and Hidden Covariates with Prior (HCP). An R Package besc implementing the algorithm is available from http://explainbio.com.
format Online
Article
Text
id pubmed-7145015
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71450152020-04-14 Blind estimation and correction of microarray batch effect Varma, Sudhir PLoS One Research Article Microarray batch effect (BE) has been the primary bottleneck for large-scale integration of data from multiple experiments. Current BE correction methods either need known batch identities (ComBat) or have the potential to overcorrect, by removing true but unknown biological differences (Surrogate Variable Analysis SVA). It is well known that experimental conditions such as array or reagent batches, PCR amplification or ozone levels can affect the measured expression levels; often the direction of perturbation of the measured expression is the same in different datasets. However, there are no BE correction algorithms that attempt to estimate the individual effects of technical differences and use them to correct expression data. In this manuscript, we show that a set of signatures, each of which is a vector the length of the number of probes, calculated on a reference set of microarray samples can predict much of the batch effect in other validation sets. We present a rationale of selecting a reference set of samples designed to estimate technical differences without removing biological differences. Putting both together, we introduce the Batch Effect Signature Correction (BESC) algorithm that uses the BES calculated on the reference set to efficiently predict and remove BE. Using two independent validation sets, we show that BESC is capable of removing batch effect without removing unknown but true biological differences. Much of the variations due to batch effect is shared between different microarray datasets. That shared information can be used to predict signatures (i.e. directions of perturbation) due to batch effect in new datasets. The correction can be precomputed without using the samples to be corrected (blind), done on each sample individually (single sample) and corrects only known technical effects without removing known or unknown biological differences (conservative). Those three characteristics make it ideal for high-throughput correction of samples for a microarray data repository. We also compare the performance of BESC to three other batch correction methods: SVA, Removing Unwanted Variation (RUV) and Hidden Covariates with Prior (HCP). An R Package besc implementing the algorithm is available from http://explainbio.com. Public Library of Science 2020-04-09 /pmc/articles/PMC7145015/ /pubmed/32271844 http://dx.doi.org/10.1371/journal.pone.0231446 Text en © 2020 Sudhir Varma http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Varma, Sudhir
Blind estimation and correction of microarray batch effect
title Blind estimation and correction of microarray batch effect
title_full Blind estimation and correction of microarray batch effect
title_fullStr Blind estimation and correction of microarray batch effect
title_full_unstemmed Blind estimation and correction of microarray batch effect
title_short Blind estimation and correction of microarray batch effect
title_sort blind estimation and correction of microarray batch effect
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7145015/
https://www.ncbi.nlm.nih.gov/pubmed/32271844
http://dx.doi.org/10.1371/journal.pone.0231446
work_keys_str_mv AT varmasudhir blindestimationandcorrectionofmicroarraybatcheffect