Cargando…

SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data

A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale pho...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiao, Di, Lin, Michael, Liu, Chunlei, Geddes, Thomas A, Burchfield, James G, Parker, Benjamin L, Humphrey, Sean J, Yang, Pengyi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632189/
https://www.ncbi.nlm.nih.gov/pubmed/37954574
http://dx.doi.org/10.1093/nargab/lqad099
_version_ 1785132526259404800
author Xiao, Di
Lin, Michael
Liu, Chunlei
Geddes, Thomas A
Burchfield, James G
Parker, Benjamin L
Humphrey, Sean J
Yang, Pengyi
author_facet Xiao, Di
Lin, Michael
Liu, Chunlei
Geddes, Thomas A
Burchfield, James G
Parker, Benjamin L
Humphrey, Sean J
Yang, Pengyi
author_sort Xiao, Di
collection PubMed
description A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a ‘pseudo-positive’ learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model (‘SnapKin’) by incorporating the above two learning strategies into a ‘snapshot’ ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin.
format Online
Article
Text
id pubmed-10632189
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106321892023-11-10 SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data Xiao, Di Lin, Michael Liu, Chunlei Geddes, Thomas A Burchfield, James G Parker, Benjamin L Humphrey, Sean J Yang, Pengyi NAR Genom Bioinform Methods Article A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a ‘pseudo-positive’ learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model (‘SnapKin’) by incorporating the above two learning strategies into a ‘snapshot’ ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin. Oxford University Press 2023-11-06 /pmc/articles/PMC10632189/ /pubmed/37954574 http://dx.doi.org/10.1093/nargab/lqad099 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Xiao, Di
Lin, Michael
Liu, Chunlei
Geddes, Thomas A
Burchfield, James G
Parker, Benjamin L
Humphrey, Sean J
Yang, Pengyi
SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_full SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_fullStr SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_full_unstemmed SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_short SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_sort snapkin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632189/
https://www.ncbi.nlm.nih.gov/pubmed/37954574
http://dx.doi.org/10.1093/nargab/lqad099
work_keys_str_mv AT xiaodi snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata
AT linmichael snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata
AT liuchunlei snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata
AT geddesthomasa snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata
AT burchfieldjamesg snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata
AT parkerbenjaminl snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata
AT humphreyseanj snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata
AT yangpengyi snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata