Cargando…

Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites

Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylat...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Hongfei, Wang, Zhuo, Li, Zhongyan, Lee, Tzong-Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7554246/
https://www.ncbi.nlm.nih.gov/pubmed/33102477
http://dx.doi.org/10.3389/fcell.2020.572195
_version_ 1783593740914393088
author Wang, Hongfei
Wang, Zhuo
Li, Zhongyan
Lee, Tzong-Yi
author_facet Wang, Hongfei
Wang, Zhuo
Li, Zhongyan
Lee, Tzong-Yi
author_sort Wang, Hongfei
collection PubMed
description Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction.
format Online
Article
Text
id pubmed-7554246
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-75542462020-10-22 Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites Wang, Hongfei Wang, Zhuo Li, Zhongyan Lee, Tzong-Yi Front Cell Dev Biol Cell and Developmental Biology Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction. Frontiers Media S.A. 2020-09-30 /pmc/articles/PMC7554246/ /pubmed/33102477 http://dx.doi.org/10.3389/fcell.2020.572195 Text en Copyright © 2020 Wang, Wang, Li and Lee. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Cell and Developmental Biology
Wang, Hongfei
Wang, Zhuo
Li, Zhongyan
Lee, Tzong-Yi
Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
title Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
title_full Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
title_fullStr Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
title_full_unstemmed Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
title_short Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
title_sort incorporating deep learning with word embedding to identify plant ubiquitylation sites
topic Cell and Developmental Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7554246/
https://www.ncbi.nlm.nih.gov/pubmed/33102477
http://dx.doi.org/10.3389/fcell.2020.572195
work_keys_str_mv AT wanghongfei incorporatingdeeplearningwithwordembeddingtoidentifyplantubiquitylationsites
AT wangzhuo incorporatingdeeplearningwithwordembeddingtoidentifyplantubiquitylationsites
AT lizhongyan incorporatingdeeplearningwithwordembeddingtoidentifyplantubiquitylationsites
AT leetzongyi incorporatingdeeplearningwithwordembeddingtoidentifyplantubiquitylationsites