Cargando…

ComBat-seq: batch effect adjustment for RNA-seq count data

The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome thes...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yuqing, Parmigiani, Giovanni, Johnson, W Evan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518324/
https://www.ncbi.nlm.nih.gov/pubmed/33015620
http://dx.doi.org/10.1093/nargab/lqaa078
_version_ 1783587378893422592
author Zhang, Yuqing
Parmigiani, Giovanni
Johnson, W Evan
author_facet Zhang, Yuqing
Parmigiani, Giovanni
Johnson, W Evan
author_sort Zhang, Yuqing
collection PubMed
description The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome these challenges. Many existing methods for batch effects adjustment assume the data follow a continuous, bell-shaped Gaussian distribution. However in RNA-seq studies the data are typically skewed, over-dispersed counts, so this assumption is not appropriate and may lead to erroneous results. Negative binomial regression models have been used previously to better capture the properties of counts. We developed a batch correction method, ComBat-seq, using a negative binomial regression model that retains the integer nature of count data in RNA-seq studies, making the batch adjusted data compatible with common differential expression software packages that require integer counts. We show in realistic simulations that the ComBat-seq adjusted data results in better statistical power and control of false positives in differential expression compared to data adjusted by the other available methods. We further demonstrated in a real data example that ComBat-seq successfully removes batch effects and recovers the biological signal in the data.
format Online
Article
Text
id pubmed-7518324
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-75183242020-09-30 ComBat-seq: batch effect adjustment for RNA-seq count data Zhang, Yuqing Parmigiani, Giovanni Johnson, W Evan NAR Genom Bioinform Methods Article The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome these challenges. Many existing methods for batch effects adjustment assume the data follow a continuous, bell-shaped Gaussian distribution. However in RNA-seq studies the data are typically skewed, over-dispersed counts, so this assumption is not appropriate and may lead to erroneous results. Negative binomial regression models have been used previously to better capture the properties of counts. We developed a batch correction method, ComBat-seq, using a negative binomial regression model that retains the integer nature of count data in RNA-seq studies, making the batch adjusted data compatible with common differential expression software packages that require integer counts. We show in realistic simulations that the ComBat-seq adjusted data results in better statistical power and control of false positives in differential expression compared to data adjusted by the other available methods. We further demonstrated in a real data example that ComBat-seq successfully removes batch effects and recovers the biological signal in the data. Oxford University Press 2020-09-21 /pmc/articles/PMC7518324/ /pubmed/33015620 http://dx.doi.org/10.1093/nargab/lqaa078 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Zhang, Yuqing
Parmigiani, Giovanni
Johnson, W Evan
ComBat-seq: batch effect adjustment for RNA-seq count data
title ComBat-seq: batch effect adjustment for RNA-seq count data
title_full ComBat-seq: batch effect adjustment for RNA-seq count data
title_fullStr ComBat-seq: batch effect adjustment for RNA-seq count data
title_full_unstemmed ComBat-seq: batch effect adjustment for RNA-seq count data
title_short ComBat-seq: batch effect adjustment for RNA-seq count data
title_sort combat-seq: batch effect adjustment for rna-seq count data
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518324/
https://www.ncbi.nlm.nih.gov/pubmed/33015620
http://dx.doi.org/10.1093/nargab/lqaa078
work_keys_str_mv AT zhangyuqing combatseqbatcheffectadjustmentforrnaseqcountdata
AT parmigianigiovanni combatseqbatcheffectadjustmentforrnaseqcountdata
AT johnsonwevan combatseqbatcheffectadjustmentforrnaseqcountdata