Cargando…
Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS
The accurate annotation of transcription start sites (TSSs) and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide man...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025762/ https://www.ncbi.nlm.nih.gov/pubmed/36528241 http://dx.doi.org/10.1016/j.gpb.2022.11.010 |
_version_ | 1784909406647877632 |
---|---|
author | Zhou, Juexiao Zhang, Bin Li, Haoyang Zhou, Longxi Li, Zhongxiao Long, Yongkang Han, Wenkai Wang, Mengran Cui, Huanhuan Li, Jingjing Chen, Wei Gao, Xin |
author_facet | Zhou, Juexiao Zhang, Bin Li, Haoyang Zhou, Longxi Li, Zhongxiao Long, Yongkang Han, Wenkai Wang, Mengran Cui, Huanhuan Li, Jingjing Chen, Wei Gao, Xin |
author_sort | Zhou, Juexiao |
collection | PubMed |
description | The accurate annotation of transcription start sites (TSSs) and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner, and various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these computational tools cast the problem as a binary classification task on a balanced dataset, thus resulting in drastic false positive predictions when applied on the genome scale. Here, we present DeeReCT-TSS, a deep learning-based method that is capable of identifying TSSs across the whole genome based on both DNA sequence and conventional RNA sequencing data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous TSS annotations on 10 cell types, which enables the identification of cell type-specific TSSs. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets by correlating our predicted TSSs with experimentally defined TSS chromatin states. The source code for DeeReCT-TSS is available at https://github.com/JoshuaChou2018/DeeReCT-TSS_release and https://ngdc.cncb.ac.cn/biocode/tools/BT007316. |
format | Online Article Text |
id | pubmed-10025762 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-100257622023-03-21 Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS Zhou, Juexiao Zhang, Bin Li, Haoyang Zhou, Longxi Li, Zhongxiao Long, Yongkang Han, Wenkai Wang, Mengran Cui, Huanhuan Li, Jingjing Chen, Wei Gao, Xin Genomics Proteomics Bioinformatics Method The accurate annotation of transcription start sites (TSSs) and their usage are critical for the mechanistic understanding of gene regulation in different biological contexts. To fulfill this, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner, and various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these computational tools cast the problem as a binary classification task on a balanced dataset, thus resulting in drastic false positive predictions when applied on the genome scale. Here, we present DeeReCT-TSS, a deep learning-based method that is capable of identifying TSSs across the whole genome based on both DNA sequence and conventional RNA sequencing data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous TSS annotations on 10 cell types, which enables the identification of cell type-specific TSSs. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets by correlating our predicted TSSs with experimentally defined TSS chromatin states. The source code for DeeReCT-TSS is available at https://github.com/JoshuaChou2018/DeeReCT-TSS_release and https://ngdc.cncb.ac.cn/biocode/tools/BT007316. Elsevier 2022-10 2022-12-15 /pmc/articles/PMC10025762/ /pubmed/36528241 http://dx.doi.org/10.1016/j.gpb.2022.11.010 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Method Zhou, Juexiao Zhang, Bin Li, Haoyang Zhou, Longxi Li, Zhongxiao Long, Yongkang Han, Wenkai Wang, Mengran Cui, Huanhuan Li, Jingjing Chen, Wei Gao, Xin Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS |
title | Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS |
title_full | Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS |
title_fullStr | Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS |
title_full_unstemmed | Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS |
title_short | Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS |
title_sort | annotating tsss in multiple cell types based on dna sequence and rna-seq data via deerect-tss |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025762/ https://www.ncbi.nlm.nih.gov/pubmed/36528241 http://dx.doi.org/10.1016/j.gpb.2022.11.010 |
work_keys_str_mv | AT zhoujuexiao annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT zhangbin annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT lihaoyang annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT zhoulongxi annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT lizhongxiao annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT longyongkang annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT hanwenkai annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT wangmengran annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT cuihuanhuan annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT lijingjing annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT chenwei annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss AT gaoxin annotatingtsssinmultiplecelltypesbasedondnasequenceandrnaseqdataviadeerecttss |