Cargando…

Deep learning-based multi-functional therapeutic peptides prediction with a multi-label focal dice loss function

MOTIVATION: With the great number of peptide sequences produced in the postgenomic era, it is highly desirable to identify the various functions of therapeutic peptides quickly. Furthermore, it is a great challenge to predict accurate multi-functional therapeutic peptides (MFTP) via sequence-based c...

Descripción completa

Detalles Bibliográficos
Autores principales: Fan, Henghui, Yan, Wenhui, Wang, Lihua, Liu, Jie, Bin, Yannan, Xia, Junfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234765/
https://www.ncbi.nlm.nih.gov/pubmed/37216900
http://dx.doi.org/10.1093/bioinformatics/btad334
Descripción
Sumario:MOTIVATION: With the great number of peptide sequences produced in the postgenomic era, it is highly desirable to identify the various functions of therapeutic peptides quickly. Furthermore, it is a great challenge to predict accurate multi-functional therapeutic peptides (MFTP) via sequence-based computational tools. RESULTS: Here, we propose a novel multi-label-based method, named ETFC, to predict 21 categories of therapeutic peptides. The method utilizes a deep learning-based model architecture, which consists of four blocks: embedding, text convolutional neural network, feed-forward network, and classification blocks. This method also adopts an imbalanced learning strategy with a novel multi-label focal dice loss function. multi-label focal dice loss is applied in the ETFC method to solve the inherent imbalance problem in the multi-label dataset and achieve competitive performance. The experimental results state that the ETFC method is significantly better than the existing methods for MFTP prediction. With the established framework, we use the teacher–student-based knowledge distillation to obtain the attention weight from the self-attention mechanism in the MFTP prediction and quantify their contributions toward each of the investigated activities. AVAILABILITY AND IMPLEMENTATION: The source code and dataset are available via: https://github.com/xialab-ahu/ETFC.