Cargando…

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays

MOTIVATION: Mapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Zhanlin, Zhang, Jing, Liu, Jason, Dai, Yi, Lee, Donghoon, Min, Martin Renqiang, Xu, Min, Gerstein, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275369/
https://www.ncbi.nlm.nih.gov/pubmed/34252960
http://dx.doi.org/10.1093/bioinformatics/btab283
_version_ 1783721699871555584
author Chen, Zhanlin
Zhang, Jing
Liu, Jason
Dai, Yi
Lee, Donghoon
Min, Martin Renqiang
Xu, Min
Gerstein, Mark
author_facet Chen, Zhanlin
Zhang, Jing
Liu, Jason
Dai, Yi
Lee, Donghoon
Min, Martin Renqiang
Xu, Min
Gerstein, Mark
author_sort Chen, Zhanlin
collection PubMed
description MOTIVATION: Mapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the statistical power for downstream analyses (e.g. causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). First, we employed direct enhancer-activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution, we implemented a weakly supervised object detection framework for enhancer localization with precise boundary detection (to a 10 bp resolution) using Gradient-weighted Class Activation Mapping. RESULTS: Our DECODE binary classifier outperformed a state-of-the-art enhancer prediction method by 24% in transgenic mouse validation. Furthermore, the object detection framework can condense enhancer annotations to only 13% of their original size, and these compact annotations have significantly higher conservation scores and genome-wide association study variant enrichments than the original predictions. Overall, DECODE is an effective tool for enhancer classification and precise localization. AVAILABILITY AND IMPLEMENTATION: DECODE source code and pre-processing scripts are available at decode.gersteinlab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8275369
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82753692021-07-13 DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays Chen, Zhanlin Zhang, Jing Liu, Jason Dai, Yi Lee, Donghoon Min, Martin Renqiang Xu, Min Gerstein, Mark Bioinformatics Regulatory and Functional Genomics MOTIVATION: Mapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the statistical power for downstream analyses (e.g. causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). First, we employed direct enhancer-activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution, we implemented a weakly supervised object detection framework for enhancer localization with precise boundary detection (to a 10 bp resolution) using Gradient-weighted Class Activation Mapping. RESULTS: Our DECODE binary classifier outperformed a state-of-the-art enhancer prediction method by 24% in transgenic mouse validation. Furthermore, the object detection framework can condense enhancer annotations to only 13% of their original size, and these compact annotations have significantly higher conservation scores and genome-wide association study variant enrichments than the original predictions. Overall, DECODE is an effective tool for enhancer classification and precise localization. AVAILABILITY AND IMPLEMENTATION: DECODE source code and pre-processing scripts are available at decode.gersteinlab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-12 /pmc/articles/PMC8275369/ /pubmed/34252960 http://dx.doi.org/10.1093/bioinformatics/btab283 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regulatory and Functional Genomics
Chen, Zhanlin
Zhang, Jing
Liu, Jason
Dai, Yi
Lee, Donghoon
Min, Martin Renqiang
Xu, Min
Gerstein, Mark
DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays
title DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays
title_full DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays
title_fullStr DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays
title_full_unstemmed DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays
title_short DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays
title_sort decode: a deep-learning framework for condensing enhancers and refining boundaries with large-scale functional assays
topic Regulatory and Functional Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275369/
https://www.ncbi.nlm.nih.gov/pubmed/34252960
http://dx.doi.org/10.1093/bioinformatics/btab283
work_keys_str_mv AT chenzhanlin decodeadeeplearningframeworkforcondensingenhancersandrefiningboundarieswithlargescalefunctionalassays
AT zhangjing decodeadeeplearningframeworkforcondensingenhancersandrefiningboundarieswithlargescalefunctionalassays
AT liujason decodeadeeplearningframeworkforcondensingenhancersandrefiningboundarieswithlargescalefunctionalassays
AT daiyi decodeadeeplearningframeworkforcondensingenhancersandrefiningboundarieswithlargescalefunctionalassays
AT leedonghoon decodeadeeplearningframeworkforcondensingenhancersandrefiningboundarieswithlargescalefunctionalassays
AT minmartinrenqiang decodeadeeplearningframeworkforcondensingenhancersandrefiningboundarieswithlargescalefunctionalassays
AT xumin decodeadeeplearningframeworkforcondensingenhancersandrefiningboundarieswithlargescalefunctionalassays
AT gersteinmark decodeadeeplearningframeworkforcondensingenhancersandrefiningboundarieswithlargescalefunctionalassays