Cargando…

Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species

In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely us...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Yumin, Wang, Haohan, Zhang, Yang, Gao, Xin, Xing, Eric P., Xu, Min
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671507/
https://www.ncbi.nlm.nih.gov/pubmed/33151940
http://dx.doi.org/10.1371/journal.pcbi.1008297
_version_ 1783610940895264768
author Zheng, Yumin
Wang, Haohan
Zhang, Yang
Gao, Xin
Xing, Eric P.
Xu, Min
author_facet Zheng, Yumin
Wang, Haohan
Zhang, Yang
Gao, Xin
Xing, Eric P.
Xu, Min
author_sort Zheng, Yumin
collection PubMed
description In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.
format Online
Article
Text
id pubmed-7671507
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-76715072020-11-19 Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species Zheng, Yumin Wang, Haohan Zhang, Yang Gao, Xin Xing, Eric P. Xu, Min PLoS Comput Biol Research Article In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set. Public Library of Science 2020-11-05 /pmc/articles/PMC7671507/ /pubmed/33151940 http://dx.doi.org/10.1371/journal.pcbi.1008297 Text en © 2020 Zheng et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zheng, Yumin
Wang, Haohan
Zhang, Yang
Gao, Xin
Xing, Eric P.
Xu, Min
Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species
title Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species
title_full Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species
title_fullStr Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species
title_full_unstemmed Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species
title_short Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species
title_sort poly(a)-dg: a deep-learning-based domain generalization method to identify cross-species poly(a) signal without prior knowledge from target species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671507/
https://www.ncbi.nlm.nih.gov/pubmed/33151940
http://dx.doi.org/10.1371/journal.pcbi.1008297
work_keys_str_mv AT zhengyumin polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT wanghaohan polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT zhangyang polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT gaoxin polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT xingericp polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT xumin polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies