Cargando…

Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats

Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. Unfortunately, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq...

Descripción completa

Detalles Bibliográficos
Autores principales: Morrissey, Alexis, Shi, Jeffrey, James, Daniela Q., Mahony, Shaun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515862/
https://www.ncbi.nlm.nih.gov/pubmed/37745557
http://dx.doi.org/10.1101/2023.09.12.556916
_version_ 1785109034810998784
author Morrissey, Alexis
Shi, Jeffrey
James, Daniela Q.
Mahony, Shaun
author_facet Morrissey, Alexis
Shi, Jeffrey
James, Daniela Q.
Mahony, Shaun
author_sort Morrissey, Alexis
collection PubMed
description Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. Unfortunately, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard “multi-mapped” reads that align equally well to multiple genomic locations. Since multi-mapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multi-mapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multi-mapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multi-mapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq datasets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly effective in identifying ChIP-seq peaks in younger TEs, which hold evolutionary significance due to their emergence during human evolution from primates.
format Online
Article
Text
id pubmed-10515862
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-105158622023-09-23 Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats Morrissey, Alexis Shi, Jeffrey James, Daniela Q. Mahony, Shaun bioRxiv Article Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. Unfortunately, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard “multi-mapped” reads that align equally well to multiple genomic locations. Since multi-mapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multi-mapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multi-mapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multi-mapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq datasets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly effective in identifying ChIP-seq peaks in younger TEs, which hold evolutionary significance due to their emergence during human evolution from primates. Cold Spring Harbor Laboratory 2023-09-15 /pmc/articles/PMC10515862/ /pubmed/37745557 http://dx.doi.org/10.1101/2023.09.12.556916 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Morrissey, Alexis
Shi, Jeffrey
James, Daniela Q.
Mahony, Shaun
Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats
title Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats
title_full Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats
title_fullStr Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats
title_full_unstemmed Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats
title_short Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats
title_sort allo: accurate allocation of multi-mapped reads enables regulatory element analysis at repeats
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515862/
https://www.ncbi.nlm.nih.gov/pubmed/37745557
http://dx.doi.org/10.1101/2023.09.12.556916
work_keys_str_mv AT morrisseyalexis alloaccurateallocationofmultimappedreadsenablesregulatoryelementanalysisatrepeats
AT shijeffrey alloaccurateallocationofmultimappedreadsenablesregulatoryelementanalysisatrepeats
AT jamesdanielaq alloaccurateallocationofmultimappedreadsenablesregulatoryelementanalysisatrepeats
AT mahonyshaun alloaccurateallocationofmultimappedreadsenablesregulatoryelementanalysisatrepeats