Cargando…

Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data

Identifying cis-regulatory motifs from genomic sequencing data (e.g. ChIP-seq and CLIP-seq) is crucial in identifying transcription factor (TF) binding sites and inferring gene regulatory mechanisms for any organism. Since 2015, deep learning (DL) methods have been widely applied to identify TF bind...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Shuangquan, Ma, Anjun, Zhao, Jing, Xu, Dong, Ma, Qin, Wang, Yan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8769700/
https://www.ncbi.nlm.nih.gov/pubmed/34607350
http://dx.doi.org/10.1093/bib/bbab374
_version_ 1784635207846985728
author Zhang, Shuangquan
Ma, Anjun
Zhao, Jing
Xu, Dong
Ma, Qin
Wang, Yan
author_facet Zhang, Shuangquan
Ma, Anjun
Zhao, Jing
Xu, Dong
Ma, Qin
Wang, Yan
author_sort Zhang, Shuangquan
collection PubMed
description Identifying cis-regulatory motifs from genomic sequencing data (e.g. ChIP-seq and CLIP-seq) is crucial in identifying transcription factor (TF) binding sites and inferring gene regulatory mechanisms for any organism. Since 2015, deep learning (DL) methods have been widely applied to identify TF binding sites and predict motif patterns, with the strengths of offering a scalable, flexible and unified computational approach for highly accurate predictions. As far as we know, 20 DL methods have been developed. However, without a clear and systematic assessment, users will struggle to choose the most appropriate tool for their specific studies. In this manuscript, we evaluated 20 DL methods for cis-regulatory motif prediction using 690 ENCODE ChIP-seq, 126 cancer ChIP-seq and 55 RNA CLIP-seq data. Four metrics were investigated, including the accuracy of motif finding, the performance of DNA/RNA sequence classification, algorithm scalability and tool usability. The assessment results demonstrated the high complementarity of the existing DL methods. It was determined that the most suitable model should primarily depend on the data size and type and the method’s outputs.
format Online
Article
Text
id pubmed-8769700
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-87697002022-01-20 Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data Zhang, Shuangquan Ma, Anjun Zhao, Jing Xu, Dong Ma, Qin Wang, Yan Brief Bioinform Review Identifying cis-regulatory motifs from genomic sequencing data (e.g. ChIP-seq and CLIP-seq) is crucial in identifying transcription factor (TF) binding sites and inferring gene regulatory mechanisms for any organism. Since 2015, deep learning (DL) methods have been widely applied to identify TF binding sites and predict motif patterns, with the strengths of offering a scalable, flexible and unified computational approach for highly accurate predictions. As far as we know, 20 DL methods have been developed. However, without a clear and systematic assessment, users will struggle to choose the most appropriate tool for their specific studies. In this manuscript, we evaluated 20 DL methods for cis-regulatory motif prediction using 690 ENCODE ChIP-seq, 126 cancer ChIP-seq and 55 RNA CLIP-seq data. Four metrics were investigated, including the accuracy of motif finding, the performance of DNA/RNA sequence classification, algorithm scalability and tool usability. The assessment results demonstrated the high complementarity of the existing DL methods. It was determined that the most suitable model should primarily depend on the data size and type and the method’s outputs. Oxford University Press 2021-10-05 /pmc/articles/PMC8769700/ /pubmed/34607350 http://dx.doi.org/10.1093/bib/bbab374 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Review
Zhang, Shuangquan
Ma, Anjun
Zhao, Jing
Xu, Dong
Ma, Qin
Wang, Yan
Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data
title Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data
title_full Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data
title_fullStr Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data
title_full_unstemmed Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data
title_short Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data
title_sort assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8769700/
https://www.ncbi.nlm.nih.gov/pubmed/34607350
http://dx.doi.org/10.1093/bib/bbab374
work_keys_str_mv AT zhangshuangquan assessingdeeplearningmethodsincisregulatorymotiffindingbasedongenomicsequencingdata
AT maanjun assessingdeeplearningmethodsincisregulatorymotiffindingbasedongenomicsequencingdata
AT zhaojing assessingdeeplearningmethodsincisregulatorymotiffindingbasedongenomicsequencingdata
AT xudong assessingdeeplearningmethodsincisregulatorymotiffindingbasedongenomicsequencingdata
AT maqin assessingdeeplearningmethodsincisregulatorymotiffindingbasedongenomicsequencingdata
AT wangyan assessingdeeplearningmethodsincisregulatorymotiffindingbasedongenomicsequencingdata