Cargando…

PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction

[Image: see text] Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. I...

Descripción completa

Detalles Bibliográficos
Autores principales: Guntuboina, Chakradhar, Das, Adrita, Mollaei, Parisa, Kim, Seongwon, Barati Farimani, Amir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683064/
https://www.ncbi.nlm.nih.gov/pubmed/37956397
http://dx.doi.org/10.1021/acs.jpclett.3c02398
_version_ 1785151109312020480
author Guntuboina, Chakradhar
Das, Adrita
Mollaei, Parisa
Kim, Seongwon
Barati Farimani, Amir
author_facet Guntuboina, Chakradhar
Das, Adrita
Mollaei, Parisa
Kim, Seongwon
Barati Farimani, Amir
author_sort Guntuboina, Chakradhar
collection PubMed
description [Image: see text] Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. Inspired by the recent progress in the field of large language models, we present PeptideBERT, a protein language model specifically tailored for predicting essential peptide properties such as hemolysis, solubility, and nonfouling. The PeptideBERT utilizes the ProtBERT pretrained transformer model with 12 attention heads and 12 hidden layers. Through fine-tuning the pretrained model for the three downstream tasks, our model is state of the art (SOTA) in predicting hemolysis, which is crucial for determining a peptide’s potential to induce red blood cells as well as nonfouling properties. Leveraging primarily shorter sequences and a data set with negative samples predominantly associated with insoluble peptides, our model showcases remarkable performance.
format Online
Article
Text
id pubmed-10683064
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-106830642023-11-30 PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction Guntuboina, Chakradhar Das, Adrita Mollaei, Parisa Kim, Seongwon Barati Farimani, Amir J Phys Chem Lett [Image: see text] Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. Inspired by the recent progress in the field of large language models, we present PeptideBERT, a protein language model specifically tailored for predicting essential peptide properties such as hemolysis, solubility, and nonfouling. The PeptideBERT utilizes the ProtBERT pretrained transformer model with 12 attention heads and 12 hidden layers. Through fine-tuning the pretrained model for the three downstream tasks, our model is state of the art (SOTA) in predicting hemolysis, which is crucial for determining a peptide’s potential to induce red blood cells as well as nonfouling properties. Leveraging primarily shorter sequences and a data set with negative samples predominantly associated with insoluble peptides, our model showcases remarkable performance. American Chemical Society 2023-11-13 /pmc/articles/PMC10683064/ /pubmed/37956397 http://dx.doi.org/10.1021/acs.jpclett.3c02398 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Guntuboina, Chakradhar
Das, Adrita
Mollaei, Parisa
Kim, Seongwon
Barati Farimani, Amir
PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction
title PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction
title_full PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction
title_fullStr PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction
title_full_unstemmed PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction
title_short PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction
title_sort peptidebert: a language model based on transformers for peptide property prediction
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683064/
https://www.ncbi.nlm.nih.gov/pubmed/37956397
http://dx.doi.org/10.1021/acs.jpclett.3c02398
work_keys_str_mv AT guntuboinachakradhar peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction
AT dasadrita peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction
AT mollaeiparisa peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction
AT kimseongwon peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction
AT baratifarimaniamir peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction