Cargando…

Unsupervised Contrastive Peak Caller for ATAC-seq

The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantifi...

Descripción completa

Detalles Bibliográficos
Autores principales: Vu, Ha T.H., Zhang, Yudi, Tuteja, Geetu, Dorman, Karin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881890/
https://www.ncbi.nlm.nih.gov/pubmed/36712015
http://dx.doi.org/10.1101/2023.01.07.523108
_version_ 1784879202877571072
author Vu, Ha T.H.
Zhang, Yudi
Tuteja, Geetu
Dorman, Karin
author_facet Vu, Ha T.H.
Zhang, Yudi
Tuteja, Geetu
Dorman, Karin
author_sort Vu, Ha T.H.
collection PubMed
description The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as “peak calling”. Most unsupervised peak calling methods are based on simple statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high quality labeled data for training, which can be difficult to obtain. Moreover, though biological replicates are recognized to be important, there are no established approaches for using replicates in the deep learning tools, and the approaches available for traditional methods either cannot be applied to ATAC-seq, where control samples may be unavailable, or are post-hoc and do not capitalize on potentially complex, but reproducible signal in the read enrichment data. Here, we propose a novel peak caller that uses unsupervised contrastive learning to extract shared signals from multiple replicates. Raw coverage data are encoded to obtain low-dimensional embeddings and optimized to minimize a contrastive loss over biological replicates. These embeddings are passed to another contrastive loss for learning and predicting peaks and decoded to denoised data under an autoencoder loss. We compared our Replicative Contrastive Learner (RCL) method with other existing methods on ATAC-seq data, using annotations from ChromHMM genome and transcription factor ChIP-seq as noisy truth. RCL consistently achieved the best performance.
format Online
Article
Text
id pubmed-9881890
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-98818902023-01-28 Unsupervised Contrastive Peak Caller for ATAC-seq Vu, Ha T.H. Zhang, Yudi Tuteja, Geetu Dorman, Karin bioRxiv Article The assay for transposase-accessible chromatin with sequencing (ATAC-seq) is a common assay to identify chromatin accessible regions by using a Tn5 transposase that can access, cut, and ligate adapters to DNA fragments for subsequent amplification and sequencing. These sequenced regions are quantified and tested for enrichment in a process referred to as “peak calling”. Most unsupervised peak calling methods are based on simple statistical models and suffer from elevated false positive rates. Newly developed supervised deep learning methods can be successful, but they rely on high quality labeled data for training, which can be difficult to obtain. Moreover, though biological replicates are recognized to be important, there are no established approaches for using replicates in the deep learning tools, and the approaches available for traditional methods either cannot be applied to ATAC-seq, where control samples may be unavailable, or are post-hoc and do not capitalize on potentially complex, but reproducible signal in the read enrichment data. Here, we propose a novel peak caller that uses unsupervised contrastive learning to extract shared signals from multiple replicates. Raw coverage data are encoded to obtain low-dimensional embeddings and optimized to minimize a contrastive loss over biological replicates. These embeddings are passed to another contrastive loss for learning and predicting peaks and decoded to denoised data under an autoencoder loss. We compared our Replicative Contrastive Learner (RCL) method with other existing methods on ATAC-seq data, using annotations from ChromHMM genome and transcription factor ChIP-seq as noisy truth. RCL consistently achieved the best performance. Cold Spring Harbor Laboratory 2023-01-08 /pmc/articles/PMC9881890/ /pubmed/36712015 http://dx.doi.org/10.1101/2023.01.07.523108 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Vu, Ha T.H.
Zhang, Yudi
Tuteja, Geetu
Dorman, Karin
Unsupervised Contrastive Peak Caller for ATAC-seq
title Unsupervised Contrastive Peak Caller for ATAC-seq
title_full Unsupervised Contrastive Peak Caller for ATAC-seq
title_fullStr Unsupervised Contrastive Peak Caller for ATAC-seq
title_full_unstemmed Unsupervised Contrastive Peak Caller for ATAC-seq
title_short Unsupervised Contrastive Peak Caller for ATAC-seq
title_sort unsupervised contrastive peak caller for atac-seq
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881890/
https://www.ncbi.nlm.nih.gov/pubmed/36712015
http://dx.doi.org/10.1101/2023.01.07.523108
work_keys_str_mv AT vuhath unsupervisedcontrastivepeakcallerforatacseq
AT zhangyudi unsupervisedcontrastivepeakcallerforatacseq
AT tutejageetu unsupervisedcontrastivepeakcallerforatacseq
AT dormankarin unsupervisedcontrastivepeakcallerforatacseq