Cargando…
Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects
The assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of epigenetic regulation of gene expression by assessing chromatin configuration for an entire genome. Despite its popularity, there have been limited studies investigating the analytical challenges related t...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9701614/ https://www.ncbi.nlm.nih.gov/pubmed/36452861 http://dx.doi.org/10.1016/j.crmeth.2022.100321 |
_version_ | 1784839572466696192 |
---|---|
author | Van den Berge, Koen Chou, Hsin-Jung Roux de Bézieux, Hector Street, Kelly Risso, Davide Ngai, John Dudoit, Sandrine |
author_facet | Van den Berge, Koen Chou, Hsin-Jung Roux de Bézieux, Hector Street, Kelly Risso, Davide Ngai, John Dudoit, Sandrine |
author_sort | Van den Berge, Koen |
collection | PubMed |
description | The assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of epigenetic regulation of gene expression by assessing chromatin configuration for an entire genome. Despite its popularity, there have been limited studies investigating the analytical challenges related to ATAC-seq data, with most studies leveraging tools developed for bulk transcriptome sequencing. Here, we show that GC-content effects are omnipresent in ATAC-seq datasets. Since the GC-content effects are sample specific, they can bias downstream analyses such as clustering and differential accessibility analysis. We introduce a normalization method based on smooth-quantile normalization within GC-content bins and evaluate it together with 11 different normalization procedures on 8 public ATAC-seq datasets. Accounting for GC-content effects in the normalization is crucial for common downstream ATAC-seq data analyses, improving accuracy and interpretability. Through case studies, we show that exploratory data analysis is essential to guide the choice of an appropriate normalization method for a given dataset. |
format | Online Article Text |
id | pubmed-9701614 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-97016142022-11-29 Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects Van den Berge, Koen Chou, Hsin-Jung Roux de Bézieux, Hector Street, Kelly Risso, Davide Ngai, John Dudoit, Sandrine Cell Rep Methods Article The assay for transposase-accessible chromatin using sequencing (ATAC-seq) allows the study of epigenetic regulation of gene expression by assessing chromatin configuration for an entire genome. Despite its popularity, there have been limited studies investigating the analytical challenges related to ATAC-seq data, with most studies leveraging tools developed for bulk transcriptome sequencing. Here, we show that GC-content effects are omnipresent in ATAC-seq datasets. Since the GC-content effects are sample specific, they can bias downstream analyses such as clustering and differential accessibility analysis. We introduce a normalization method based on smooth-quantile normalization within GC-content bins and evaluate it together with 11 different normalization procedures on 8 public ATAC-seq datasets. Accounting for GC-content effects in the normalization is crucial for common downstream ATAC-seq data analyses, improving accuracy and interpretability. Through case studies, we show that exploratory data analysis is essential to guide the choice of an appropriate normalization method for a given dataset. Elsevier 2022-11-01 /pmc/articles/PMC9701614/ /pubmed/36452861 http://dx.doi.org/10.1016/j.crmeth.2022.100321 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Van den Berge, Koen Chou, Hsin-Jung Roux de Bézieux, Hector Street, Kelly Risso, Davide Ngai, John Dudoit, Sandrine Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects |
title | Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects |
title_full | Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects |
title_fullStr | Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects |
title_full_unstemmed | Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects |
title_short | Normalization benchmark of ATAC-seq datasets shows the importance of accounting for GC-content effects |
title_sort | normalization benchmark of atac-seq datasets shows the importance of accounting for gc-content effects |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9701614/ https://www.ncbi.nlm.nih.gov/pubmed/36452861 http://dx.doi.org/10.1016/j.crmeth.2022.100321 |
work_keys_str_mv | AT vandenbergekoen normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects AT chouhsinjung normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects AT rouxdebezieuxhector normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects AT streetkelly normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects AT rissodavide normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects AT ngaijohn normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects AT dudoitsandrine normalizationbenchmarkofatacseqdatasetsshowstheimportanceofaccountingforgccontenteffects |