Cargando…
SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale pho...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632189/ https://www.ncbi.nlm.nih.gov/pubmed/37954574 http://dx.doi.org/10.1093/nargab/lqad099 |
_version_ | 1785132526259404800 |
---|---|
author | Xiao, Di Lin, Michael Liu, Chunlei Geddes, Thomas A Burchfield, James G Parker, Benjamin L Humphrey, Sean J Yang, Pengyi |
author_facet | Xiao, Di Lin, Michael Liu, Chunlei Geddes, Thomas A Burchfield, James G Parker, Benjamin L Humphrey, Sean J Yang, Pengyi |
author_sort | Xiao, Di |
collection | PubMed |
description | A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a ‘pseudo-positive’ learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model (‘SnapKin’) by incorporating the above two learning strategies into a ‘snapshot’ ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin. |
format | Online Article Text |
id | pubmed-10632189 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-106321892023-11-10 SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data Xiao, Di Lin, Michael Liu, Chunlei Geddes, Thomas A Burchfield, James G Parker, Benjamin L Humphrey, Sean J Yang, Pengyi NAR Genom Bioinform Methods Article A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a ‘pseudo-positive’ learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model (‘SnapKin’) by incorporating the above two learning strategies into a ‘snapshot’ ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin. Oxford University Press 2023-11-06 /pmc/articles/PMC10632189/ /pubmed/37954574 http://dx.doi.org/10.1093/nargab/lqad099 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Article Xiao, Di Lin, Michael Liu, Chunlei Geddes, Thomas A Burchfield, James G Parker, Benjamin L Humphrey, Sean J Yang, Pengyi SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data |
title | SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data |
title_full | SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data |
title_fullStr | SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data |
title_full_unstemmed | SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data |
title_short | SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data |
title_sort | snapkin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632189/ https://www.ncbi.nlm.nih.gov/pubmed/37954574 http://dx.doi.org/10.1093/nargab/lqad099 |
work_keys_str_mv | AT xiaodi snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT linmichael snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT liuchunlei snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT geddesthomasa snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT burchfieldjamesg snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT parkerbenjaminl snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT humphreyseanj snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT yangpengyi snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata |