Cargando…
PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction
[Image: see text] Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. I...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683064/ https://www.ncbi.nlm.nih.gov/pubmed/37956397 http://dx.doi.org/10.1021/acs.jpclett.3c02398 |
_version_ | 1785151109312020480 |
---|---|
author | Guntuboina, Chakradhar Das, Adrita Mollaei, Parisa Kim, Seongwon Barati Farimani, Amir |
author_facet | Guntuboina, Chakradhar Das, Adrita Mollaei, Parisa Kim, Seongwon Barati Farimani, Amir |
author_sort | Guntuboina, Chakradhar |
collection | PubMed |
description | [Image: see text] Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. Inspired by the recent progress in the field of large language models, we present PeptideBERT, a protein language model specifically tailored for predicting essential peptide properties such as hemolysis, solubility, and nonfouling. The PeptideBERT utilizes the ProtBERT pretrained transformer model with 12 attention heads and 12 hidden layers. Through fine-tuning the pretrained model for the three downstream tasks, our model is state of the art (SOTA) in predicting hemolysis, which is crucial for determining a peptide’s potential to induce red blood cells as well as nonfouling properties. Leveraging primarily shorter sequences and a data set with negative samples predominantly associated with insoluble peptides, our model showcases remarkable performance. |
format | Online Article Text |
id | pubmed-10683064 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-106830642023-11-30 PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction Guntuboina, Chakradhar Das, Adrita Mollaei, Parisa Kim, Seongwon Barati Farimani, Amir J Phys Chem Lett [Image: see text] Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. Inspired by the recent progress in the field of large language models, we present PeptideBERT, a protein language model specifically tailored for predicting essential peptide properties such as hemolysis, solubility, and nonfouling. The PeptideBERT utilizes the ProtBERT pretrained transformer model with 12 attention heads and 12 hidden layers. Through fine-tuning the pretrained model for the three downstream tasks, our model is state of the art (SOTA) in predicting hemolysis, which is crucial for determining a peptide’s potential to induce red blood cells as well as nonfouling properties. Leveraging primarily shorter sequences and a data set with negative samples predominantly associated with insoluble peptides, our model showcases remarkable performance. American Chemical Society 2023-11-13 /pmc/articles/PMC10683064/ /pubmed/37956397 http://dx.doi.org/10.1021/acs.jpclett.3c02398 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Guntuboina, Chakradhar Das, Adrita Mollaei, Parisa Kim, Seongwon Barati Farimani, Amir PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction |
title | PeptideBERT:
A Language Model Based on Transformers
for Peptide Property Prediction |
title_full | PeptideBERT:
A Language Model Based on Transformers
for Peptide Property Prediction |
title_fullStr | PeptideBERT:
A Language Model Based on Transformers
for Peptide Property Prediction |
title_full_unstemmed | PeptideBERT:
A Language Model Based on Transformers
for Peptide Property Prediction |
title_short | PeptideBERT:
A Language Model Based on Transformers
for Peptide Property Prediction |
title_sort | peptidebert:
a language model based on transformers
for peptide property prediction |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10683064/ https://www.ncbi.nlm.nih.gov/pubmed/37956397 http://dx.doi.org/10.1021/acs.jpclett.3c02398 |
work_keys_str_mv | AT guntuboinachakradhar peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction AT dasadrita peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction AT mollaeiparisa peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction AT kimseongwon peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction AT baratifarimaniamir peptidebertalanguagemodelbasedontransformersforpeptidepropertyprediction |