Cargando…

SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data

Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predic...

Descripción completa

Detalles Bibliográficos
Autores principales: Le, Nguyen Quoc Khanh, Nguyen, Van-Nui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924420/
https://www.ncbi.nlm.nih.gov/pubmed/33816830
http://dx.doi.org/10.7717/peerj-cs.177
_version_ 1783659085632110592
author Le, Nguyen Quoc Khanh
Nguyen, Van-Nui
author_facet Le, Nguyen Quoc Khanh
Nguyen, Van-Nui
author_sort Le, Nguyen Quoc Khanh
collection PubMed
description Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict SNARE proteins, which is one of the most vital molecular functions in life science. A functional loss of SNARE proteins has been implicated in a variety of human diseases (e.g., neurodegenerative, mental illness, cancer, and so on). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases, and designing the drug targets. Our SNARE-CNN model which uses two-dimensional convolutional neural networks and position-specific scoring matrix profiles could identify SNARE proteins with achieved sensitivity of 76.6%, specificity of 93.5%, accuracy of 89.7%, and MCC of 0.7 in cross-validation dataset. We also evaluate the performance of our model via an independent dataset and the result shows that we are able to solve the overfitting problem. Compared with other state-of-the-art methods, this approach achieved significant improvement in all of the metrics. Throughout the proposed study, we provide an effective model for identifying SNARE proteins and a basis for further research that can apply deep learning in bioinformatics, especially in protein function prediction. SNARE-CNN are freely available at https://github.com/khanhlee/snare-cnn.
format Online
Article
Text
id pubmed-7924420
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79244202021-04-02 SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data Le, Nguyen Quoc Khanh Nguyen, Van-Nui PeerJ Comput Sci Bioinformatics Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict SNARE proteins, which is one of the most vital molecular functions in life science. A functional loss of SNARE proteins has been implicated in a variety of human diseases (e.g., neurodegenerative, mental illness, cancer, and so on). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases, and designing the drug targets. Our SNARE-CNN model which uses two-dimensional convolutional neural networks and position-specific scoring matrix profiles could identify SNARE proteins with achieved sensitivity of 76.6%, specificity of 93.5%, accuracy of 89.7%, and MCC of 0.7 in cross-validation dataset. We also evaluate the performance of our model via an independent dataset and the result shows that we are able to solve the overfitting problem. Compared with other state-of-the-art methods, this approach achieved significant improvement in all of the metrics. Throughout the proposed study, we provide an effective model for identifying SNARE proteins and a basis for further research that can apply deep learning in bioinformatics, especially in protein function prediction. SNARE-CNN are freely available at https://github.com/khanhlee/snare-cnn. PeerJ Inc. 2019-02-25 /pmc/articles/PMC7924420/ /pubmed/33816830 http://dx.doi.org/10.7717/peerj-cs.177 Text en ©2019 Le and Nguyen http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Le, Nguyen Quoc Khanh
Nguyen, Van-Nui
SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_full SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_fullStr SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_full_unstemmed SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_short SNARE-CNN: a 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data
title_sort snare-cnn: a 2d convolutional neural network architecture to identify snare proteins from high-throughput sequencing data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924420/
https://www.ncbi.nlm.nih.gov/pubmed/33816830
http://dx.doi.org/10.7717/peerj-cs.177
work_keys_str_mv AT lenguyenquockhanh snarecnna2dconvolutionalneuralnetworkarchitecturetoidentifysnareproteinsfromhighthroughputsequencingdata
AT nguyenvannui snarecnna2dconvolutionalneuralnetworkarchitecturetoidentifysnareproteinsfromhighthroughputsequencingdata