Cargando…

Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects

The assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of epigenetic regulation of gene expression by assessing chromatin configuration for an entire genome. Despite its popularity, there have been limited studies investigating the analytical challenges related t...

Descripción completa

Detalles Bibliográficos
Autores principales: Van den Berge, Koen, Chou, Hsin-Jung, Roux de Bézieux, Hector, Street, Kelly, Risso, Davide, Ngai, John, Dudoit, Sandrine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9701614/
https://www.ncbi.nlm.nih.gov/pubmed/36452861
http://dx.doi.org/10.1016/j.crmeth.2022.100321
_version_ 1784839572466696192
author Van den Berge, Koen
Chou, Hsin-Jung
Roux de Bézieux, Hector
Street, Kelly
Risso, Davide
Ngai, John
Dudoit, Sandrine
author_facet Van den Berge, Koen
Chou, Hsin-Jung
Roux de Bézieux, Hector
Street, Kelly
Risso, Davide
Ngai, John
Dudoit, Sandrine
author_sort Van den Berge, Koen
collection PubMed
description The assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of epigenetic regulation of gene expression by assessing chromatin configuration for an entire genome. Despite its popularity, there have been limited studies investigating the analytical challenges related to ATAC-seq data, with most studies leveraging tools developed for bulk transcriptome sequencing. Here, we show that GC-content effects are omnipresent in ATAC-seq datasets. Since the GC-content effects are sample specific, they can bias downstream analyses such as clustering and differential accessibility analysis. We introduce a normalization method based on smooth-quantile normalization within GC-content bins and evaluate it together with 11 different normalization procedures on 8 public ATAC-seq datasets. Accounting for GC-content effects in the normalization is crucial for common downstream ATAC-seq data analyses, improving accuracy and interpretability. Through case studies, we show that exploratory data analysis is essential to guide the choice of an appropriate normalization method for a given dataset.
format Online
Article
Text
id pubmed-9701614
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-97016142022-11-29 Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects Van den Berge, Koen Chou, Hsin-Jung Roux de Bézieux, Hector Street, Kelly Risso, Davide Ngai, John Dudoit, Sandrine Cell Rep Methods Article The assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of epigenetic regulation of gene expression by assessing chromatin configuration for an entire genome. Despite its popularity, there have been limited studies investigating the analytical challenges related to ATAC-seq data, with most studies leveraging tools developed for bulk transcriptome sequencing. Here, we show that GC-content effects are omnipresent in ATAC-seq datasets. Since the GC-content effects are sample specific, they can bias downstream analyses such as clustering and differential accessibility analysis. We introduce a normalization method based on smooth-quantile normalization within GC-content bins and evaluate it together with 11 different normalization procedures on 8 public ATAC-seq datasets. Accounting for GC-content effects in the normalization is crucial for common downstream ATAC-seq data analyses, improving accuracy and interpretability. Through case studies, we show that exploratory data analysis is essential to guide the choice of an appropriate normalization method for a given dataset. Elsevier 2022-11-01 /pmc/articles/PMC9701614/ /pubmed/36452861 http://dx.doi.org/10.1016/j.crmeth.2022.100321 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Van den Berge, Koen
Chou, Hsin-Jung
Roux de Bézieux, Hector
Street, Kelly
Risso, Davide
Ngai, John
Dudoit, Sandrine
Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
title Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
title_full Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
title_fullStr Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
title_full_unstemmed Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
title_short Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
title_sort normalization benchmark of atac-seq datasets shows the importance of accounting for gc-content effects
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9701614/
https://www.ncbi.nlm.nih.gov/pubmed/36452861
http://dx.doi.org/10.1016/j.crmeth.2022.100321
work_keys_str_mv AT vandenbergekoen normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects
AT chouhsinjung normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects
AT rouxdebezieuxhector normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects
AT streetkelly normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects
AT rissodavide normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects
AT ngaijohn normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects
AT dudoitsandrine normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects