Cargando…

SSMFN: a fused spatial and sequential deep learning model for methylation site prediction

BACKGROUND: Conventional in vivo methods for post-translational modification site prediction such as spectrophotometry, Western blotting, and chromatin immune precipitation can be very expensive and time-consuming. Neural networks (NN) are one of the computational approaches that can predict effecti...

Descripción completa

Detalles Bibliográficos
Autores principales: Lumbanraja, Favorisen Rosyking, Mahesworo, Bharuno, Cenggoro, Tjeng Wawan, Sudigyo, Digdo, Pardamean, Bens
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8409337/
https://www.ncbi.nlm.nih.gov/pubmed/34541311
http://dx.doi.org/10.7717/peerj-cs.683
_version_ 1783746977779941376
author Lumbanraja, Favorisen Rosyking
Mahesworo, Bharuno
Cenggoro, Tjeng Wawan
Sudigyo, Digdo
Pardamean, Bens
author_facet Lumbanraja, Favorisen Rosyking
Mahesworo, Bharuno
Cenggoro, Tjeng Wawan
Sudigyo, Digdo
Pardamean, Bens
author_sort Lumbanraja, Favorisen Rosyking
collection PubMed
description BACKGROUND: Conventional in vivo methods for post-translational modification site prediction such as spectrophotometry, Western blotting, and chromatin immune precipitation can be very expensive and time-consuming. Neural networks (NN) are one of the computational approaches that can predict effectively the post-translational modification site. We developed a neural network model, namely the Sequential and Spatial Methylation Fusion Network (SSMFN), to predict possible methylation sites on protein sequences. METHOD: We designed our model to be able to extract spatial and sequential information from amino acid sequences. Convolutional neural networks (CNN) is applied to harness spatial information, while long short-term memory (LSTM) is applied for sequential data. The latent representation of the CNN and LSTM branch are then fused. Afterwards, we compared the performance of our proposed model to the state-of-the-art methylation site prediction models on the balanced and imbalanced dataset. RESULTS: Our model appeared to be better in almost all measurement when trained on the balanced training dataset. On the imbalanced training dataset, all of the models gave better performance since they are trained on more data. In several metrics, our model also surpasses the PRMePred model, which requires a laborious effort for feature extraction and selection. CONCLUSION: Our models achieved the best performance across different environments in almost all measurements. Also, our result suggests that the NN model trained on a balanced training dataset and tested on an imbalanced dataset will offer high specificity and low sensitivity. Thus, the NN model for methylation site prediction should be trained on an imbalanced dataset. Since in the actual application, there are far more negative samples than positive samples.
format Online
Article
Text
id pubmed-8409337
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-84093372021-09-17 SSMFN: a fused spatial and sequential deep learning model for methylation site prediction Lumbanraja, Favorisen Rosyking Mahesworo, Bharuno Cenggoro, Tjeng Wawan Sudigyo, Digdo Pardamean, Bens PeerJ Comput Sci Bioinformatics BACKGROUND: Conventional in vivo methods for post-translational modification site prediction such as spectrophotometry, Western blotting, and chromatin immune precipitation can be very expensive and time-consuming. Neural networks (NN) are one of the computational approaches that can predict effectively the post-translational modification site. We developed a neural network model, namely the Sequential and Spatial Methylation Fusion Network (SSMFN), to predict possible methylation sites on protein sequences. METHOD: We designed our model to be able to extract spatial and sequential information from amino acid sequences. Convolutional neural networks (CNN) is applied to harness spatial information, while long short-term memory (LSTM) is applied for sequential data. The latent representation of the CNN and LSTM branch are then fused. Afterwards, we compared the performance of our proposed model to the state-of-the-art methylation site prediction models on the balanced and imbalanced dataset. RESULTS: Our model appeared to be better in almost all measurement when trained on the balanced training dataset. On the imbalanced training dataset, all of the models gave better performance since they are trained on more data. In several metrics, our model also surpasses the PRMePred model, which requires a laborious effort for feature extraction and selection. CONCLUSION: Our models achieved the best performance across different environments in almost all measurements. Also, our result suggests that the NN model trained on a balanced training dataset and tested on an imbalanced dataset will offer high specificity and low sensitivity. Thus, the NN model for methylation site prediction should be trained on an imbalanced dataset. Since in the actual application, there are far more negative samples than positive samples. PeerJ Inc. 2021-08-26 /pmc/articles/PMC8409337/ /pubmed/34541311 http://dx.doi.org/10.7717/peerj-cs.683 Text en ©2021 Lumbanraja et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Lumbanraja, Favorisen Rosyking
Mahesworo, Bharuno
Cenggoro, Tjeng Wawan
Sudigyo, Digdo
Pardamean, Bens
SSMFN: a fused spatial and sequential deep learning model for methylation site prediction
title SSMFN: a fused spatial and sequential deep learning model for methylation site prediction
title_full SSMFN: a fused spatial and sequential deep learning model for methylation site prediction
title_fullStr SSMFN: a fused spatial and sequential deep learning model for methylation site prediction
title_full_unstemmed SSMFN: a fused spatial and sequential deep learning model for methylation site prediction
title_short SSMFN: a fused spatial and sequential deep learning model for methylation site prediction
title_sort ssmfn: a fused spatial and sequential deep learning model for methylation site prediction
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8409337/
https://www.ncbi.nlm.nih.gov/pubmed/34541311
http://dx.doi.org/10.7717/peerj-cs.683
work_keys_str_mv AT lumbanrajafavorisenrosyking ssmfnafusedspatialandsequentialdeeplearningmodelformethylationsiteprediction
AT mahesworobharuno ssmfnafusedspatialandsequentialdeeplearningmodelformethylationsiteprediction
AT cenggorotjengwawan ssmfnafusedspatialandsequentialdeeplearningmodelformethylationsiteprediction
AT sudigyodigdo ssmfnafusedspatialandsequentialdeeplearningmodelformethylationsiteprediction
AT pardameanbens ssmfnafusedspatialandsequentialdeeplearningmodelformethylationsiteprediction