Cargando…

Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS

The accurate annotation of transcription start sites (TSSs) and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide man...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Juexiao, Zhang, Bin, Li, Haoyang, Zhou, Longxi, Li, Zhongxiao, Long, Yongkang, Han, Wenkai, Wang, Mengran, Cui, Huanhuan, Li, Jingjing, Chen, Wei, Gao, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025762/
https://www.ncbi.nlm.nih.gov/pubmed/36528241
http://dx.doi.org/10.1016/j.gpb.2022.11.010
_version_ 1784909406647877632
author Zhou, Juexiao
Zhang, Bin
Li, Haoyang
Zhou, Longxi
Li, Zhongxiao
Long, Yongkang
Han, Wenkai
Wang, Mengran
Cui, Huanhuan
Li, Jingjing
Chen, Wei
Gao, Xin
author_facet Zhou, Juexiao
Zhang, Bin
Li, Haoyang
Zhou, Longxi
Li, Zhongxiao
Long, Yongkang
Han, Wenkai
Wang, Mengran
Cui, Huanhuan
Li, Jingjing
Chen, Wei
Gao, Xin
author_sort Zhou, Juexiao
collection PubMed
description The accurate annotation of transcription start sites (TSSs) and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner, and various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these computational tools cast the problem as a binary classification task on a balanced dataset, thus resulting in drastic false positive predictions when applied on the genome scale. Here, we present DeeReCT-TSS, a deep learning-based method that is capable of identifying TSSs across the whole genome based on both DNA sequence and conventional RNA sequencing data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous TSS annotations on 10 cell types, which enables the identification of cell type-specific TSSs. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets by correlating our predicted TSSs with experimentally defined TSS chromatin states. The source code for DeeReCT-TSS is available at https://github.com/JoshuaChou2018/DeeReCT-TSS_release and https://ngdc.cncb.ac.cn/biocode/tools/BT007316.
format Online
Article
Text
id pubmed-10025762
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-100257622023-03-21 Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS Zhou, Juexiao Zhang, Bin Li, Haoyang Zhou, Longxi Li, Zhongxiao Long, Yongkang Han, Wenkai Wang, Mengran Cui, Huanhuan Li, Jingjing Chen, Wei Gao, Xin Genomics Proteomics Bioinformatics Method The accurate annotation of transcription start sites (TSSs) and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner, and various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these computational tools cast the problem as a binary classification task on a balanced dataset, thus resulting in drastic false positive predictions when applied on the genome scale. Here, we present DeeReCT-TSS, a deep learning-based method that is capable of identifying TSSs across the whole genome based on both DNA sequence and conventional RNA sequencing data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous TSS annotations on 10 cell types, which enables the identification of cell type-specific TSSs. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets by correlating our predicted TSSs with experimentally defined TSS chromatin states. The source code for DeeReCT-TSS is available at https://github.com/JoshuaChou2018/DeeReCT-TSS_release and https://ngdc.cncb.ac.cn/biocode/tools/BT007316. Elsevier 2022-10 2022-12-15 /pmc/articles/PMC10025762/ /pubmed/36528241 http://dx.doi.org/10.1016/j.gpb.2022.11.010 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method
Zhou, Juexiao
Zhang, Bin
Li, Haoyang
Zhou, Longxi
Li, Zhongxiao
Long, Yongkang
Han, Wenkai
Wang, Mengran
Cui, Huanhuan
Li, Jingjing
Chen, Wei
Gao, Xin
Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS
title Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS
title_full Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS
title_fullStr Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS
title_full_unstemmed Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS
title_short Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS
title_sort annotating tsss in multiple cell types based on dna sequence and rna-seq data via deerect-tss
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025762/
https://www.ncbi.nlm.nih.gov/pubmed/36528241
http://dx.doi.org/10.1016/j.gpb.2022.11.010
work_keys_str_mv AT zhoujuexiao annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT zhangbin annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT lihaoyang annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT zhoulongxi annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT lizhongxiao annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT longyongkang annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT hanwenkai annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT wangmengran annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT cuihuanhuan annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT lijingjing annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT chenwei annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss
AT gaoxin annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss