Cargando…
Unsupervised contrastive peak caller for ATAC-seq
The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantifi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538491/ https://www.ncbi.nlm.nih.gov/pubmed/37217250 http://dx.doi.org/10.1101/gr.277677.123 |
_version_ | 1785113317688213504 |
---|---|
author | Vu, Ha T.H. Zhang, Yudi Tuteja, Geetu Dorman, Karin S. |
author_facet | Vu, Ha T.H. Zhang, Yudi Tuteja, Geetu Dorman, Karin S. |
author_sort | Vu, Ha T.H. |
collection | PubMed |
description | The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as “peak calling.” Most unsupervised peak calling methods are based on simple statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high quality labeled data for training, which can be difficult to obtain. Moreover, though biological replicates are recognized to be important, there are no established approaches for using replicates in the deep learning tools, and the approaches available for traditional methods either cannot be applied to ATAC-seq, where control samples may be unavailable, or are post hoc and do not capitalize on potentially complex, but reproducible signal in the read enrichment data. Here, we propose a novel peak caller that uses unsupervised contrastive learning to extract shared signals from multiple replicates. Raw coverage data are encoded to obtain low-dimensional embeddings and optimized to minimize a contrastive loss over biological replicates. These embeddings are passed to another contrastive loss for learning and predicting peaks and decoded to denoised data under an autoencoder loss. We compared our replicative contrastive learner (RCL) method with other existing methods on ATAC-seq data, using annotations from ChromHMM genomic labels and transcription factor ChIP-seq as noisy truth. RCL consistently achieved the best performance. |
format | Online Article Text |
id | pubmed-10538491 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105384912023-09-29 Unsupervised contrastive peak caller for ATAC-seq Vu, Ha T.H. Zhang, Yudi Tuteja, Geetu Dorman, Karin S. Genome Res Methods The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as “peak calling.” Most unsupervised peak calling methods are based on simple statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high quality labeled data for training, which can be difficult to obtain. Moreover, though biological replicates are recognized to be important, there are no established approaches for using replicates in the deep learning tools, and the approaches available for traditional methods either cannot be applied to ATAC-seq, where control samples may be unavailable, or are post hoc and do not capitalize on potentially complex, but reproducible signal in the read enrichment data. Here, we propose a novel peak caller that uses unsupervised contrastive learning to extract shared signals from multiple replicates. Raw coverage data are encoded to obtain low-dimensional embeddings and optimized to minimize a contrastive loss over biological replicates. These embeddings are passed to another contrastive loss for learning and predicting peaks and decoded to denoised data under an autoencoder loss. We compared our replicative contrastive learner (RCL) method with other existing methods on ATAC-seq data, using annotations from ChromHMM genomic labels and transcription factor ChIP-seq as noisy truth. RCL consistently achieved the best performance. Cold Spring Harbor Laboratory Press 2023-07 /pmc/articles/PMC10538491/ /pubmed/37217250 http://dx.doi.org/10.1101/gr.277677.123 Text en © 2023 Vu et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Methods Vu, Ha T.H. Zhang, Yudi Tuteja, Geetu Dorman, Karin S. Unsupervised contrastive peak caller for ATAC-seq |
title | Unsupervised contrastive peak caller for ATAC-seq |
title_full | Unsupervised contrastive peak caller for ATAC-seq |
title_fullStr | Unsupervised contrastive peak caller for ATAC-seq |
title_full_unstemmed | Unsupervised contrastive peak caller for ATAC-seq |
title_short | Unsupervised contrastive peak caller for ATAC-seq |
title_sort | unsupervised contrastive peak caller for atac-seq |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538491/ https://www.ncbi.nlm.nih.gov/pubmed/37217250 http://dx.doi.org/10.1101/gr.277677.123 |
work_keys_str_mv | AT vuhath unsupervisedcontrastivepeakcallerforatacseq AT zhangyudi unsupervisedcontrastivepeakcallerforatacseq AT tutejageetu unsupervisedcontrastivepeakcallerforatacseq AT dormankarins unsupervisedcontrastivepeakcallerforatacseq |