DeeReCT-PolyA: a robust and generic deep learning method for PAS identification

MOTIVATION: Polyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PASs) identification is not only desired for the purpose of better transcripts’ end annotation, but can also help us gain a deeper insight o...

Descripción completa

Detalles Bibliográficos
Autores principales: Xia, Zhihao, Li, Yu, Zhang, Bin, Li, Zhongxiao, Hu, Yuhui, Chen, Wei, Gao, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612895/
https://www.ncbi.nlm.nih.gov/pubmed/30500881
http://dx.doi.org/10.1093/bioinformatics/bty991
_version_ 1783432960236584960
author Xia, Zhihao
Li, Yu
Zhang, Bin
Li, Zhongxiao
Hu, Yuhui
Chen, Wei
Gao, Xin
author_facet Xia, Zhihao
Li, Yu
Zhang, Bin
Li, Zhongxiao
Hu, Yuhui
Chen, Wei
Gao, Xin
author_sort Xia, Zhihao
collection PubMed
description MOTIVATION: Polyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PASs) identification is not only desired for the purpose of better transcripts’ end annotation, but can also help us gain a deeper insight of the underlying regulatory mechanism. Although many methods have been proposed for PAS recognition, most of them are PAS motif- and human-specific, which leads to high risks of overfitting, low generalization power, and inability to reveal the connections between the underlying mechanisms of different mammals. RESULTS: In this work, we propose a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning model for accurate PAS recognition, which requires no prior knowledge or human-designed features. We show that our single model trained over all human PAS motifs not only outperforms the state-of-the-art methods trained on specific motifs, but can also be generalized well to two mouse datasets. Moreover, we further increase the prediction accuracy by transferring the deep learning model trained on the data of one species to the data of a different species. Several novel underlying poly(A) patterns are revealed through the visualization of important oligomers and positions in our trained models. Finally, we interpret the deep learning models by converting the convolutional filters into sequence logos and quantitatively compare the sequence logos between human and mouse datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/likesum/DeeReCT-PolyA SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6612895
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66128952019-07-12 DeeReCT-PolyA: a robust and generic deep learning method for PAS identification Xia, Zhihao Li, Yu Zhang, Bin Li, Zhongxiao Hu, Yuhui Chen, Wei Gao, Xin Bioinformatics Original Papers MOTIVATION: Polyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PASs) identification is not only desired for the purpose of better transcripts’ end annotation, but can also help us gain a deeper insight of the underlying regulatory mechanism. Although many methods have been proposed for PAS recognition, most of them are PAS motif- and human-specific, which leads to high risks of overfitting, low generalization power, and inability to reveal the connections between the underlying mechanisms of different mammals. RESULTS: In this work, we propose a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning model for accurate PAS recognition, which requires no prior knowledge or human-designed features. We show that our single model trained over all human PAS motifs not only outperforms the state-of-the-art methods trained on specific motifs, but can also be generalized well to two mouse datasets. Moreover, we further increase the prediction accuracy by transferring the deep learning model trained on the data of one species to the data of a different species. Several novel underlying poly(A) patterns are revealed through the visualization of important oligomers and positions in our trained models. Finally, we interpret the deep learning models by converting the convolutional filters into sequence logos and quantitatively compare the sequence logos between human and mouse datasets. AVAILABILITY AND IMPLEMENTATION: https://github.com/likesum/DeeReCT-PolyA SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2018-11-30 /pmc/articles/PMC6612895/ /pubmed/30500881 http://dx.doi.org/10.1093/bioinformatics/bty991 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Xia, Zhihao
Li, Yu
Zhang, Bin
Li, Zhongxiao
Hu, Yuhui
Chen, Wei
Gao, Xin
DeeReCT-PolyA: a robust and generic deep learning method for PAS identification
title DeeReCT-PolyA: a robust and generic deep learning method for PAS identification
title_full DeeReCT-PolyA: a robust and generic deep learning method for PAS identification
title_fullStr DeeReCT-PolyA: a robust and generic deep learning method for PAS identification
title_full_unstemmed DeeReCT-PolyA: a robust and generic deep learning method for PAS identification
title_short DeeReCT-PolyA: a robust and generic deep learning method for PAS identification
title_sort deerect-polya: a robust and generic deep learning method for pas identification
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612895/
https://www.ncbi.nlm.nih.gov/pubmed/30500881
http://dx.doi.org/10.1093/bioinformatics/bty991
work_keys_str_mv AT xiazhihao deerectpolyaarobustandgenericdeeplearningmethodforpasidentification
AT liyu deerectpolyaarobustandgenericdeeplearningmethodforpasidentification
AT zhangbin deerectpolyaarobustandgenericdeeplearningmethodforpasidentification
AT lizhongxiao deerectpolyaarobustandgenericdeeplearningmethodforpasidentification
AT huyuhui deerectpolyaarobustandgenericdeeplearningmethodforpasidentification
AT chenwei deerectpolyaarobustandgenericdeeplearningmethodforpasidentification
AT gaoxin deerectpolyaarobustandgenericdeeplearningmethodforpasidentification