Cargando…

Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation

As the most pervasive epigenetic mark present on mRNA and lncRNA, N(6)-methyladenosine (m(6)A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years;...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Daiyun, Chen, Kunqi, Song, Bowen, Wei, Zhen, Su, Jionglong, Coenen, Frans, de Magalhães, João Pedro, Rigden, Daniel J, Meng, Jia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561283/
https://www.ncbi.nlm.nih.gov/pubmed/36155798
http://dx.doi.org/10.1093/nar/gkac830
_version_ 1784807918226374656
author Huang, Daiyun
Chen, Kunqi
Song, Bowen
Wei, Zhen
Su, Jionglong
Coenen, Frans
de Magalhães, João Pedro
Rigden, Daniel J
Meng, Jia
author_facet Huang, Daiyun
Chen, Kunqi
Song, Bowen
Wei, Zhen
Su, Jionglong
Coenen, Frans
de Magalhães, João Pedro
Rigden, Daniel J
Meng, Jia
author_sort Huang, Daiyun
collection PubMed
description As the most pervasive epigenetic mark present on mRNA and lncRNA, N(6)-methyladenosine (m(6)A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years; nevertheless, their potential remains underexploited. One reason for this is that existing models usually consider only the sequence of transcripts, ignoring the various regions (or geography) of transcripts such as 3′UTR and intron, where the epigenetic mark forms and functions. Here, we developed three simple yet powerful encoding schemes for transcripts to capture the submolecular geographic information of RNA, which is largely independent from sequences. We show that m(6)A prediction models based on geographic information alone can achieve comparable performances to classic sequence-based methods. Importantly, geographic information substantially enhances the accuracy of sequence-based models, enables isoform- and tissue-specific prediction of m(6)A sites, and improves m(6)A signal detection from direct RNA sequencing data. The geographic encoding schemes we developed have exhibited strong interpretability, and are applicable to not only m(6)A but also N(1)-methyladenosine (m(1)A), and can serve as a general and effective complement to the widely used sequence encoding schemes in deep learning applications concerning RNA transcripts.
format Online
Article
Text
id pubmed-9561283
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95612832022-10-18 Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation Huang, Daiyun Chen, Kunqi Song, Bowen Wei, Zhen Su, Jionglong Coenen, Frans de Magalhães, João Pedro Rigden, Daniel J Meng, Jia Nucleic Acids Res Computational Biology As the most pervasive epigenetic mark present on mRNA and lncRNA, N(6)-methyladenosine (m(6)A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years; nevertheless, their potential remains underexploited. One reason for this is that existing models usually consider only the sequence of transcripts, ignoring the various regions (or geography) of transcripts such as 3′UTR and intron, where the epigenetic mark forms and functions. Here, we developed three simple yet powerful encoding schemes for transcripts to capture the submolecular geographic information of RNA, which is largely independent from sequences. We show that m(6)A prediction models based on geographic information alone can achieve comparable performances to classic sequence-based methods. Importantly, geographic information substantially enhances the accuracy of sequence-based models, enables isoform- and tissue-specific prediction of m(6)A sites, and improves m(6)A signal detection from direct RNA sequencing data. The geographic encoding schemes we developed have exhibited strong interpretability, and are applicable to not only m(6)A but also N(1)-methyladenosine (m(1)A), and can serve as a general and effective complement to the widely used sequence encoding schemes in deep learning applications concerning RNA transcripts. Oxford University Press 2022-09-26 /pmc/articles/PMC9561283/ /pubmed/36155798 http://dx.doi.org/10.1093/nar/gkac830 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Huang, Daiyun
Chen, Kunqi
Song, Bowen
Wei, Zhen
Su, Jionglong
Coenen, Frans
de Magalhães, João Pedro
Rigden, Daniel J
Meng, Jia
Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation
title Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation
title_full Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation
title_fullStr Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation
title_full_unstemmed Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation
title_short Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation
title_sort geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of rna methylation
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9561283/
https://www.ncbi.nlm.nih.gov/pubmed/36155798
http://dx.doi.org/10.1093/nar/gkac830
work_keys_str_mv AT huangdaiyun geographicencodingoftranscriptsenabledhighaccuracyandisoformawaredeeplearningofrnamethylation
AT chenkunqi geographicencodingoftranscriptsenabledhighaccuracyandisoformawaredeeplearningofrnamethylation
AT songbowen geographicencodingoftranscriptsenabledhighaccuracyandisoformawaredeeplearningofrnamethylation
AT weizhen geographicencodingoftranscriptsenabledhighaccuracyandisoformawaredeeplearningofrnamethylation
AT sujionglong geographicencodingoftranscriptsenabledhighaccuracyandisoformawaredeeplearningofrnamethylation
AT coenenfrans geographicencodingoftranscriptsenabledhighaccuracyandisoformawaredeeplearningofrnamethylation
AT demagalhaesjoaopedro geographicencodingoftranscriptsenabledhighaccuracyandisoformawaredeeplearningofrnamethylation
AT rigdendanielj geographicencodingoftranscriptsenabledhighaccuracyandisoformawaredeeplearningofrnamethylation
AT mengjia geographicencodingoftranscriptsenabledhighaccuracyandisoformawaredeeplearningofrnamethylation