Cargando…

Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation

SNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and syna...

Descripción completa

Detalles Bibliográficos
Autores principales: Le, Nguyen Quoc Khanh, Huynh, Tuan-Tu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6914855/
https://www.ncbi.nlm.nih.gov/pubmed/31920706
http://dx.doi.org/10.3389/fphys.2019.01501
_version_ 1783479897112444928
author Le, Nguyen Quoc Khanh
Huynh, Tuan-Tu
author_facet Le, Nguyen Quoc Khanh
Huynh, Tuan-Tu
author_sort Le, Nguyen Quoc Khanh
collection PubMed
description SNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and synaptic transmission, to promote cell membrane integration in eukaryotes. Many studies determined that SNARE proteins have been associated with a lot of human diseases, especially in cancer. Therefore, identifying their functions is a challenging problem for scientists to better understand the cancer disease as well as design the drug targets for treatment. We described each protein sequence based on the amino acid embeddings using fastText, which is a natural language processing model performing well in its field. Because each protein sequence is similar to a sentence with different words, applying language model into protein sequence is challenging and promising. After generating, the amino acid embedding features were fed into a deep learning algorithm for prediction. Our model which combines fastText model and deep convolutional neural networks could identify SNARE proteins with an independent test accuracy of 92.8%, sensitivity of 88.5%, specificity of 97%, and Matthews correlation coefficient (MCC) of 0.86. Our performance results were superior to the state-of-the-art predictor (SNARE-CNN). We suggest this study as a reliable method for biologists for SNARE identification and it serves a basis for applying fastText word embedding model into bioinformatics, especially in protein sequencing prediction.
format Online
Article
Text
id pubmed-6914855
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-69148552020-01-09 Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation Le, Nguyen Quoc Khanh Huynh, Tuan-Tu Front Physiol Physiology SNAREs (soluble N-ethylmaleimide-sensitive factor activating protein receptors) are a group of proteins that are crucial for membrane fusion and exocytosis of neurotransmitters from the cell. They play an important role in a broad range of cell processes, including cell growth, cytokinesis, and synaptic transmission, to promote cell membrane integration in eukaryotes. Many studies determined that SNARE proteins have been associated with a lot of human diseases, especially in cancer. Therefore, identifying their functions is a challenging problem for scientists to better understand the cancer disease as well as design the drug targets for treatment. We described each protein sequence based on the amino acid embeddings using fastText, which is a natural language processing model performing well in its field. Because each protein sequence is similar to a sentence with different words, applying language model into protein sequence is challenging and promising. After generating, the amino acid embedding features were fed into a deep learning algorithm for prediction. Our model which combines fastText model and deep convolutional neural networks could identify SNARE proteins with an independent test accuracy of 92.8%, sensitivity of 88.5%, specificity of 97%, and Matthews correlation coefficient (MCC) of 0.86. Our performance results were superior to the state-of-the-art predictor (SNARE-CNN). We suggest this study as a reliable method for biologists for SNARE identification and it serves a basis for applying fastText word embedding model into bioinformatics, especially in protein sequencing prediction. Frontiers Media S.A. 2019-12-10 /pmc/articles/PMC6914855/ /pubmed/31920706 http://dx.doi.org/10.3389/fphys.2019.01501 Text en Copyright © 2019 Le and Huynh. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Physiology
Le, Nguyen Quoc Khanh
Huynh, Tuan-Tu
Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_full Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_fullStr Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_full_unstemmed Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_short Identifying SNAREs by Incorporating Deep Learning Architecture and Amino Acid Embedding Representation
title_sort identifying snares by incorporating deep learning architecture and amino acid embedding representation
topic Physiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6914855/
https://www.ncbi.nlm.nih.gov/pubmed/31920706
http://dx.doi.org/10.3389/fphys.2019.01501
work_keys_str_mv AT lenguyenquockhanh identifyingsnaresbyincorporatingdeeplearningarchitectureandaminoacidembeddingrepresentation
AT huynhtuantu identifyingsnaresbyincorporatingdeeplearningarchitectureandaminoacidembeddingrepresentation