Cargando…

Bayesian neural network with pretrained protein embedding enhances prediction accuracy of drug-protein interaction

MOTIVATION: Characterizing drug–protein interactions (DPIs) is crucial to the high-throughput screening for drug discovery. The deep learning-based approaches have attracted attention because they can predict DPIs without human trial and error. However, because data labeling requires significant res...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, QHwan, Ko, Joon-Hyuk, Kim, Sunghoon, Park, Nojun, Jhe, Wonho
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8545317/
https://www.ncbi.nlm.nih.gov/pubmed/33978713
http://dx.doi.org/10.1093/bioinformatics/btab346
Descripción
Sumario:MOTIVATION: Characterizing drug–protein interactions (DPIs) is crucial to the high-throughput screening for drug discovery. The deep learning-based approaches have attracted attention because they can predict DPIs without human trial and error. However, because data labeling requires significant resources, the available protein data size is relatively small, which consequently decreases model performance. Here, we propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset. RESULTS: At first, we use transfer learning in encoding protein sequences with a pretrained model, which trains general sequence representations in an unsupervised manner. Second, we use a Bayesian neural network to make a robust model by estimating the data uncertainty. Our resulting model performs better than the previous baselines at predicting interactions between molecules and proteins. We also show that the quantified uncertainty from the Bayesian inference is related to confidence and can be used for screening DPI data points. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/QHwan/PretrainDPI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.