Cargando…

A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides

As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Byungjo, Shin, Min Kyoung, Hwang, In-Wook, Jung, Junghyun, Shim, Yu Jeong, Kim, Go Woon, Kim, Seung Tae, Jang, Wonhee, Sung, Jung-Suk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8619404/
https://www.ncbi.nlm.nih.gov/pubmed/34830173
http://dx.doi.org/10.3390/ijms222212291
_version_ 1784604983322214400
author Lee, Byungjo
Shin, Min Kyoung
Hwang, In-Wook
Jung, Junghyun
Shim, Yu Jeong
Kim, Go Woon
Kim, Seung Tae
Jang, Wonhee
Sung, Jung-Suk
author_facet Lee, Byungjo
Shin, Min Kyoung
Hwang, In-Wook
Jung, Junghyun
Shim, Yu Jeong
Kim, Go Woon
Kim, Seung Tae
Jang, Wonhee
Sung, Jung-Suk
author_sort Lee, Byungjo
collection PubMed
description As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing a deep learning model is the limited data on neurotoxic peptides. Here, we present a peptide data augmentation method that improves the recognition of neurotoxic peptides via a convolutional neural network model. The neurotoxic peptides were augmented with the known neurotoxic peptides from UniProt database, and the models were trained using a training set with or without the generated sequences to verify the augmented data. The model trained with the augmented dataset outperformed the one with the unaugmented dataset, achieving accuracy of 0.9953, precision of 0.9922, recall of 0.9984, and F1 score of 0.9953 in simulation dataset. From the set of all RNA transcripts of Callobius koreanus spider, we discovered neurotoxic peptides via the model, resulting in 275 putative peptides of which 252 novel sequences and only 23 sequences showing homology with the known peptides by Basic Local Alignment Search Tool. Among these 275 peptides, four were selected and shown to have neuromodulatory effects on the human neuroblastoma cell line SH-SY5Y. The augmentation method presented here may be applied to the identification of other functional peptides from biological resources with insufficient data.
format Online
Article
Text
id pubmed-8619404
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86194042021-11-27 A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides Lee, Byungjo Shin, Min Kyoung Hwang, In-Wook Jung, Junghyun Shim, Yu Jeong Kim, Go Woon Kim, Seung Tae Jang, Wonhee Sung, Jung-Suk Int J Mol Sci Article As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing a deep learning model is the limited data on neurotoxic peptides. Here, we present a peptide data augmentation method that improves the recognition of neurotoxic peptides via a convolutional neural network model. The neurotoxic peptides were augmented with the known neurotoxic peptides from UniProt database, and the models were trained using a training set with or without the generated sequences to verify the augmented data. The model trained with the augmented dataset outperformed the one with the unaugmented dataset, achieving accuracy of 0.9953, precision of 0.9922, recall of 0.9984, and F1 score of 0.9953 in simulation dataset. From the set of all RNA transcripts of Callobius koreanus spider, we discovered neurotoxic peptides via the model, resulting in 275 putative peptides of which 252 novel sequences and only 23 sequences showing homology with the known peptides by Basic Local Alignment Search Tool. Among these 275 peptides, four were selected and shown to have neuromodulatory effects on the human neuroblastoma cell line SH-SY5Y. The augmentation method presented here may be applied to the identification of other functional peptides from biological resources with insufficient data. MDPI 2021-11-13 /pmc/articles/PMC8619404/ /pubmed/34830173 http://dx.doi.org/10.3390/ijms222212291 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Lee, Byungjo
Shin, Min Kyoung
Hwang, In-Wook
Jung, Junghyun
Shim, Yu Jeong
Kim, Go Woon
Kim, Seung Tae
Jang, Wonhee
Sung, Jung-Suk
A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_full A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_fullStr A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_full_unstemmed A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_short A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
title_sort deep learning approach with data augmentation to predict novel spider neurotoxic peptides
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8619404/
https://www.ncbi.nlm.nih.gov/pubmed/34830173
http://dx.doi.org/10.3390/ijms222212291
work_keys_str_mv AT leebyungjo adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT shinminkyoung adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT hwanginwook adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT jungjunghyun adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT shimyujeong adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT kimgowoon adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT kimseungtae adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT jangwonhee adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT sungjungsuk adeeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT leebyungjo deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT shinminkyoung deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT hwanginwook deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT jungjunghyun deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT shimyujeong deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT kimgowoon deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT kimseungtae deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT jangwonhee deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides
AT sungjungsuk deeplearningapproachwithdataaugmentationtopredictnovelspiderneurotoxicpeptides