Cargando…

RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites

Malonylation, which has recently emerged as an important lysine modification, regulates diverse biological activities and has been implicated in several pervasive disorders, including cardiovascular disease and cancer. However, conventional global proteomics analysis using tandem mass spectrometry c...

Descripción completa

Detalles Bibliográficos
Autores principales: AL-barakati, Hussam, Thapa, Niraj, Hiroto, Saigo, Roy, Kaushik, Newman, Robert H., KC, Dukka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7160427/
https://www.ncbi.nlm.nih.gov/pubmed/32322367
http://dx.doi.org/10.1016/j.csbj.2020.02.012
_version_ 1783522750865866752
author AL-barakati, Hussam
Thapa, Niraj
Hiroto, Saigo
Roy, Kaushik
Newman, Robert H.
KC, Dukka
author_facet AL-barakati, Hussam
Thapa, Niraj
Hiroto, Saigo
Roy, Kaushik
Newman, Robert H.
KC, Dukka
author_sort AL-barakati, Hussam
collection PubMed
description Malonylation, which has recently emerged as an important lysine modification, regulates diverse biological activities and has been implicated in several pervasive disorders, including cardiovascular disease and cancer. However, conventional global proteomics analysis using tandem mass spectrometry can be time-consuming, expensive and technically challenging. Therefore, to complement and extend existing experimental methods for malonylation site identification, we developed two novel computational methods for malonylation site prediction based on random forest and deep learning machine learning algorithms, RF-MaloSite and DL-MaloSite, respectively. DL-MaloSite requires the primary amino acid sequence as an input and RF-MaloSite utilizes a diverse set of biochemical, physiochemical and sequence-based features. While systematic assessment of performance metrics suggests that both ‘RF-MaloSite’ and ‘DL-MaloSite’ perform well in all metrics tested, our methods perform particularly well in the areas of accuracy, sensitivity and overall method performance (assessed by the Matthew’s Correlation Coefficient). For instance, RF-MaloSite exhibited MCC scores of 0.42 and 0.40 using 10-fold cross-validation and an independent test set, respectively. Meanwhile, DL-MaloSite was characterized by MCC scores of 0.51 and 0.49 based on 10-fold cross-validation and an independent set, respectively. Importantly, both methods exhibited efficiency scores that were on par or better than those achieved by existing malonylation site prediction methods. The identification of these sites may also provide important insights into the mechanisms of crosstalk between malonylation and other lysine modifications, such as acetylation, glutarylation and succinylation. To facilitate their use, both methods have been made freely available to the research community at https://github.com/dukkakc/DL-MaloSite-and-RF-MaloSite.
format Online
Article
Text
id pubmed-7160427
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-71604272020-04-22 RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites AL-barakati, Hussam Thapa, Niraj Hiroto, Saigo Roy, Kaushik Newman, Robert H. KC, Dukka Comput Struct Biotechnol J Research Article Malonylation, which has recently emerged as an important lysine modification, regulates diverse biological activities and has been implicated in several pervasive disorders, including cardiovascular disease and cancer. However, conventional global proteomics analysis using tandem mass spectrometry can be time-consuming, expensive and technically challenging. Therefore, to complement and extend existing experimental methods for malonylation site identification, we developed two novel computational methods for malonylation site prediction based on random forest and deep learning machine learning algorithms, RF-MaloSite and DL-MaloSite, respectively. DL-MaloSite requires the primary amino acid sequence as an input and RF-MaloSite utilizes a diverse set of biochemical, physiochemical and sequence-based features. While systematic assessment of performance metrics suggests that both ‘RF-MaloSite’ and ‘DL-MaloSite’ perform well in all metrics tested, our methods perform particularly well in the areas of accuracy, sensitivity and overall method performance (assessed by the Matthew’s Correlation Coefficient). For instance, RF-MaloSite exhibited MCC scores of 0.42 and 0.40 using 10-fold cross-validation and an independent test set, respectively. Meanwhile, DL-MaloSite was characterized by MCC scores of 0.51 and 0.49 based on 10-fold cross-validation and an independent set, respectively. Importantly, both methods exhibited efficiency scores that were on par or better than those achieved by existing malonylation site prediction methods. The identification of these sites may also provide important insights into the mechanisms of crosstalk between malonylation and other lysine modifications, such as acetylation, glutarylation and succinylation. To facilitate their use, both methods have been made freely available to the research community at https://github.com/dukkakc/DL-MaloSite-and-RF-MaloSite. Research Network of Computational and Structural Biotechnology 2020-03-04 /pmc/articles/PMC7160427/ /pubmed/32322367 http://dx.doi.org/10.1016/j.csbj.2020.02.012 Text en © 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
AL-barakati, Hussam
Thapa, Niraj
Hiroto, Saigo
Roy, Kaushik
Newman, Robert H.
KC, Dukka
RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites
title RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites
title_full RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites
title_fullStr RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites
title_full_unstemmed RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites
title_short RF-MaloSite and DL-Malosite: Methods based on random forest and deep learning to identify malonylation sites
title_sort rf-malosite and dl-malosite: methods based on random forest and deep learning to identify malonylation sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7160427/
https://www.ncbi.nlm.nih.gov/pubmed/32322367
http://dx.doi.org/10.1016/j.csbj.2020.02.012
work_keys_str_mv AT albarakatihussam rfmalositeanddlmalositemethodsbasedonrandomforestanddeeplearningtoidentifymalonylationsites
AT thapaniraj rfmalositeanddlmalositemethodsbasedonrandomforestanddeeplearningtoidentifymalonylationsites
AT hirotosaigo rfmalositeanddlmalositemethodsbasedonrandomforestanddeeplearningtoidentifymalonylationsites
AT roykaushik rfmalositeanddlmalositemethodsbasedonrandomforestanddeeplearningtoidentifymalonylationsites
AT newmanroberth rfmalositeanddlmalositemethodsbasedonrandomforestanddeeplearningtoidentifymalonylationsites
AT kcdukka rfmalositeanddlmalositemethodsbasedonrandomforestanddeeplearningtoidentifymalonylationsites