Cargando…

A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction

BACKGROUND: Ubiquitylation is an important post-translational modification of proteins that not only plays a central role in cellular coding, but is also closely associated with the development of a variety of diseases. The specific selection of substrate by ligase E3 is the key in ubiquitylation. A...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Mengqi, Li, Zhongyan, Li, Shangfu, Lee, Tzong-Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8524957/
https://www.ncbi.nlm.nih.gov/pubmed/34663215
http://dx.doi.org/10.1186/s12859-021-04435-7
_version_ 1784585579787190272
author Luo, Mengqi
Li, Zhongyan
Li, Shangfu
Lee, Tzong-Yi
author_facet Luo, Mengqi
Li, Zhongyan
Li, Shangfu
Lee, Tzong-Yi
author_sort Luo, Mengqi
collection PubMed
description BACKGROUND: Ubiquitylation is an important post-translational modification of proteins that not only plays a central role in cellular coding, but is also closely associated with the development of a variety of diseases. The specific selection of substrate by ligase E3 is the key in ubiquitylation. As various high-throughput analytical techniques continue to be applied to the study of ubiquitylation, a large amount of ubiquitylation site data, and records of E3-substrate interactions continue to be generated. Biomedical literature is an important vehicle for information on E3-substrate interactions in ubiquitylation and related new discoveries, as well as an important channel for researchers to obtain such up to date data. The continuous explosion of ubiquitylation related literature poses a great challenge to researchers in acquiring and analyzing the information. Therefore, automatic annotation of these E3-substrate interaction sentences from the available literature is urgently needed. RESULTS: In this research, we proposed a model based on representation and attention mechanism based deep learning methods, to automatic annotate E3-substrate interaction sentences in biomedical literature. Focusing on the sentences with E3 protein inside, we applied several natural language processing methods and a Long Short-Term Memory (LSTM)-based deep learning classifier to train the model. Experimental results had proved the effectiveness of our proposed model. And also, the proposed attention mechanism deep learning method outperforms other statistical machine learning methods. We also created a manual corpus of E3-substrate interaction sentences, in which the E3 proteins and substrate proteins are also labeled, in order to construct our model. The corpus and model proposed by our research are definitely able to be very useful and valuable resource for advancement of ubiquitylation-related research. CONCLUSION: Having the entire manual corpus of E3-substrate interaction sentences readily available in electronic form will greatly facilitate subsequent text mining and machine learning analyses. Automatic annotating ubiquitylation sentences stating E3 ligase-substrate interaction is significantly benefited from semantic representation and deep learning. The model enables rapid information accessing and can assist in further screening of key ubiquitylation ligase substrates for in-depth studies.
format Online
Article
Text
id pubmed-8524957
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-85249572021-10-22 A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction Luo, Mengqi Li, Zhongyan Li, Shangfu Lee, Tzong-Yi BMC Bioinformatics Research BACKGROUND: Ubiquitylation is an important post-translational modification of proteins that not only plays a central role in cellular coding, but is also closely associated with the development of a variety of diseases. The specific selection of substrate by ligase E3 is the key in ubiquitylation. As various high-throughput analytical techniques continue to be applied to the study of ubiquitylation, a large amount of ubiquitylation site data, and records of E3-substrate interactions continue to be generated. Biomedical literature is an important vehicle for information on E3-substrate interactions in ubiquitylation and related new discoveries, as well as an important channel for researchers to obtain such up to date data. The continuous explosion of ubiquitylation related literature poses a great challenge to researchers in acquiring and analyzing the information. Therefore, automatic annotation of these E3-substrate interaction sentences from the available literature is urgently needed. RESULTS: In this research, we proposed a model based on representation and attention mechanism based deep learning methods, to automatic annotate E3-substrate interaction sentences in biomedical literature. Focusing on the sentences with E3 protein inside, we applied several natural language processing methods and a Long Short-Term Memory (LSTM)-based deep learning classifier to train the model. Experimental results had proved the effectiveness of our proposed model. And also, the proposed attention mechanism deep learning method outperforms other statistical machine learning methods. We also created a manual corpus of E3-substrate interaction sentences, in which the E3 proteins and substrate proteins are also labeled, in order to construct our model. The corpus and model proposed by our research are definitely able to be very useful and valuable resource for advancement of ubiquitylation-related research. CONCLUSION: Having the entire manual corpus of E3-substrate interaction sentences readily available in electronic form will greatly facilitate subsequent text mining and machine learning analyses. Automatic annotating ubiquitylation sentences stating E3 ligase-substrate interaction is significantly benefited from semantic representation and deep learning. The model enables rapid information accessing and can assist in further screening of key ubiquitylation ligase substrates for in-depth studies. BioMed Central 2021-10-18 /pmc/articles/PMC8524957/ /pubmed/34663215 http://dx.doi.org/10.1186/s12859-021-04435-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Luo, Mengqi
Li, Zhongyan
Li, Shangfu
Lee, Tzong-Yi
A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction
title A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction
title_full A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction
title_fullStr A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction
title_full_unstemmed A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction
title_short A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction
title_sort representation and deep learning model for annotating ubiquitylation sentences stating e3 ligase - substrate interaction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8524957/
https://www.ncbi.nlm.nih.gov/pubmed/34663215
http://dx.doi.org/10.1186/s12859-021-04435-7
work_keys_str_mv AT luomengqi arepresentationanddeeplearningmodelforannotatingubiquitylationsentencesstatinge3ligasesubstrateinteraction
AT lizhongyan arepresentationanddeeplearningmodelforannotatingubiquitylationsentencesstatinge3ligasesubstrateinteraction
AT lishangfu arepresentationanddeeplearningmodelforannotatingubiquitylationsentencesstatinge3ligasesubstrateinteraction
AT leetzongyi arepresentationanddeeplearningmodelforannotatingubiquitylationsentencesstatinge3ligasesubstrateinteraction
AT luomengqi representationanddeeplearningmodelforannotatingubiquitylationsentencesstatinge3ligasesubstrateinteraction
AT lizhongyan representationanddeeplearningmodelforannotatingubiquitylationsentencesstatinge3ligasesubstrateinteraction
AT lishangfu representationanddeeplearningmodelforannotatingubiquitylationsentencesstatinge3ligasesubstrateinteraction
AT leetzongyi representationanddeeplearningmodelforannotatingubiquitylationsentencesstatinge3ligasesubstrateinteraction