Cargando…
Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites
Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylat...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7554246/ https://www.ncbi.nlm.nih.gov/pubmed/33102477 http://dx.doi.org/10.3389/fcell.2020.572195 |
_version_ | 1783593740914393088 |
---|---|
author | Wang, Hongfei Wang, Zhuo Li, Zhongyan Lee, Tzong-Yi |
author_facet | Wang, Hongfei Wang, Zhuo Li, Zhongyan Lee, Tzong-Yi |
author_sort | Wang, Hongfei |
collection | PubMed |
description | Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction. |
format | Online Article Text |
id | pubmed-7554246 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-75542462020-10-22 Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites Wang, Hongfei Wang, Zhuo Li, Zhongyan Lee, Tzong-Yi Front Cell Dev Biol Cell and Developmental Biology Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction. Frontiers Media S.A. 2020-09-30 /pmc/articles/PMC7554246/ /pubmed/33102477 http://dx.doi.org/10.3389/fcell.2020.572195 Text en Copyright © 2020 Wang, Wang, Li and Lee. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Cell and Developmental Biology Wang, Hongfei Wang, Zhuo Li, Zhongyan Lee, Tzong-Yi Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites |
title | Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites |
title_full | Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites |
title_fullStr | Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites |
title_full_unstemmed | Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites |
title_short | Incorporating Deep Learning With Word Embedding to Identify Plant Ubiquitylation Sites |
title_sort | incorporating deep learning with word embedding to identify plant ubiquitylation sites |
topic | Cell and Developmental Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7554246/ https://www.ncbi.nlm.nih.gov/pubmed/33102477 http://dx.doi.org/10.3389/fcell.2020.572195 |
work_keys_str_mv | AT wanghongfei incorporatingdeeplearningwithwordembeddingtoidentifyplantubiquitylationsites AT wangzhuo incorporatingdeeplearningwithwordembeddingtoidentifyplantubiquitylationsites AT lizhongyan incorporatingdeeplearningwithwordembeddingtoidentifyplantubiquitylationsites AT leetzongyi incorporatingdeeplearningwithwordembeddingtoidentifyplantubiquitylationsites |