Cargando…

SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data

A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale pho...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xiao, Di, Lin, Michael, Liu, Chunlei, Geddes, Thomas A, Burchfield, James G, Parker, Benjamin L, Humphrey, Sean J, Yang, Pengyi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Methods Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632189/ https://www.ncbi.nlm.nih.gov/pubmed/37954574 http://dx.doi.org/10.1093/nargab/lqad099

_version_	1785132526259404800
author	Xiao, Di Lin, Michael Liu, Chunlei Geddes, Thomas A Burchfield, James G Parker, Benjamin L Humphrey, Sean J Yang, Pengyi
author_facet	Xiao, Di Lin, Michael Liu, Chunlei Geddes, Thomas A Burchfield, James G Parker, Benjamin L Humphrey, Sean J Yang, Pengyi
author_sort	Xiao, Di
collection	PubMed
description	A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a ‘pseudo-positive’ learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model (‘SnapKin’) by incorporating the above two learning strategies into a ‘snapshot’ ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin.
format	Online Article Text
id	pubmed-10632189
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-106321892023-11-10 SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data Xiao, Di Lin, Michael Liu, Chunlei Geddes, Thomas A Burchfield, James G Parker, Benjamin L Humphrey, Sean J Yang, Pengyi NAR Genom Bioinform Methods Article A major challenge in mass spectrometry-based phosphoproteomics lies in identifying the substrates of kinases, as currently only a small fraction of substrates identified can be confidently linked with a known kinase. Machine learning techniques are promising approaches for leveraging large-scale phosphoproteomics data to computationally predict substrates of kinases. However, the small number of experimentally validated kinase substrates (true positive) and the high data noise in many phosphoproteomics datasets together limit their applicability and utility. Here, we aim to develop advanced kinase-substrate prediction methods to address these challenges. Using a collection of seven large phosphoproteomics datasets, and both traditional and deep learning models, we first demonstrate that a ‘pseudo-positive’ learning strategy for alleviating small sample size is effective at improving model predictive performance. We next show that a data resampling-based ensemble learning strategy is useful for improving model stability while further enhancing prediction. Lastly, we introduce an ensemble deep learning model (‘SnapKin’) by incorporating the above two learning strategies into a ‘snapshot’ ensemble learning algorithm. We propose SnapKin, an ensemble deep learning method, for predicting substrates of kinases from large-scale phosphoproteomics data. We demonstrate that SnapKin consistently outperforms existing methods in kinase-substrate prediction. SnapKin is freely available at https://github.com/PYangLab/SnapKin. Oxford University Press 2023-11-06 /pmc/articles/PMC10632189/ /pubmed/37954574 http://dx.doi.org/10.1093/nargab/lqad099 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methods Article Xiao, Di Lin, Michael Liu, Chunlei Geddes, Thomas A Burchfield, James G Parker, Benjamin L Humphrey, Sean J Yang, Pengyi SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title	SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_full	SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_fullStr	SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_full_unstemmed	SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_short	SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
title_sort	snapkin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data
topic	Methods Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10632189/ https://www.ncbi.nlm.nih.gov/pubmed/37954574 http://dx.doi.org/10.1093/nargab/lqad099
work_keys_str_mv	AT xiaodi snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT linmichael snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT liuchunlei snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT geddesthomasa snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT burchfieldjamesg snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT parkerbenjaminl snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT humphreyseanj snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata AT yangpengyi snapkinasnapshotdeeplearningensembleforkinasesubstratepredictionfromphosphoproteomicsdata

SnapKin: a snapshot deep learning ensemble for kinase-substrate prediction from phosphoproteomics data

Ejemplares similares