Cargando…

Identifying antimicrobial peptides using word embedding with deep recurrent neural networks

MOTIVATION: Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. How...

Descripción completa

Detalles Bibliográficos
Autores principales: Hamid, Md-Nafiz, Friedberg, Iddo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581433/
https://www.ncbi.nlm.nih.gov/pubmed/30418485
http://dx.doi.org/10.1093/bioinformatics/bty937
_version_ 1783428166226804736
author Hamid, Md-Nafiz
Friedberg, Iddo
author_facet Hamid, Md-Nafiz
Friedberg, Iddo
author_sort Hamid, Md-Nafiz
collection PubMed
description MOTIVATION: Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. However, the discovery of new bacteriocins by genomic mining is hampered by their sequences’ low complexity and high variance, which frustrates sequence similarity-based searches. RESULTS: Here we use word embeddings of protein sequences to represent bacteriocins, and apply a word embedding method that accounts for amino acid order in protein sequences, to predict novel bacteriocins from protein sequences without using sequence similarity. Our method predicts, with a high probability, six yet unknown putative bacteriocins in Lactobacillus. Generalized, the representation of sequences with word embeddings preserving sequence order information can be applied to peptide and protein classification problems for which sequence similarity cannot be used. AVAILABILITY AND IMPLEMENTATION: Data and source code for this project are freely available at: https://github.com/nafizh/NeuBI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6581433
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65814332019-06-21 Identifying antimicrobial peptides using word embedding with deep recurrent neural networks Hamid, Md-Nafiz Friedberg, Iddo Bioinformatics Original Papers MOTIVATION: Antibiotic resistance constitutes a major public health crisis, and finding new sources of antimicrobial drugs is crucial to solving it. Bacteriocins, which are bacterially produced antimicrobial peptide products, are candidates for broadening the available choices of antimicrobials. However, the discovery of new bacteriocins by genomic mining is hampered by their sequences’ low complexity and high variance, which frustrates sequence similarity-based searches. RESULTS: Here we use word embeddings of protein sequences to represent bacteriocins, and apply a word embedding method that accounts for amino acid order in protein sequences, to predict novel bacteriocins from protein sequences without using sequence similarity. Our method predicts, with a high probability, six yet unknown putative bacteriocins in Lactobacillus. Generalized, the representation of sequences with word embeddings preserving sequence order information can be applied to peptide and protein classification problems for which sequence similarity cannot be used. AVAILABILITY AND IMPLEMENTATION: Data and source code for this project are freely available at: https://github.com/nafizh/NeuBI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-06 2018-11-10 /pmc/articles/PMC6581433/ /pubmed/30418485 http://dx.doi.org/10.1093/bioinformatics/bty937 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Hamid, Md-Nafiz
Friedberg, Iddo
Identifying antimicrobial peptides using word embedding with deep recurrent neural networks
title Identifying antimicrobial peptides using word embedding with deep recurrent neural networks
title_full Identifying antimicrobial peptides using word embedding with deep recurrent neural networks
title_fullStr Identifying antimicrobial peptides using word embedding with deep recurrent neural networks
title_full_unstemmed Identifying antimicrobial peptides using word embedding with deep recurrent neural networks
title_short Identifying antimicrobial peptides using word embedding with deep recurrent neural networks
title_sort identifying antimicrobial peptides using word embedding with deep recurrent neural networks
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581433/
https://www.ncbi.nlm.nih.gov/pubmed/30418485
http://dx.doi.org/10.1093/bioinformatics/bty937
work_keys_str_mv AT hamidmdnafiz identifyingantimicrobialpeptidesusingwordembeddingwithdeeprecurrentneuralnetworks
AT friedbergiddo identifyingantimicrobialpeptidesusingwordembeddingwithdeeprecurrentneuralnetworks