Cargando…

Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe

Stability of proteins at high temperature has been a topic of interest for many years, as this attribute is favourable for applications ranging from therapeutics to industrial chemical manufacturing. Our current understanding and methods for designing high-temperature stability into target proteins...

Descripción completa

Detalles Bibliográficos
Autores principales: Komp, Evan, Alanzi, Humood N., Francis, Ryan, Vuong, Chau, Roberts, Logan, Mosallanejad, Amin, Beck, David A. C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10560248/
https://www.ncbi.nlm.nih.gov/pubmed/37805601
http://dx.doi.org/10.1038/s41597-023-02553-w
_version_ 1785117690839433216
author Komp, Evan
Alanzi, Humood N.
Francis, Ryan
Vuong, Chau
Roberts, Logan
Mosallanejad, Amin
Beck, David A. C.
author_facet Komp, Evan
Alanzi, Humood N.
Francis, Ryan
Vuong, Chau
Roberts, Logan
Mosallanejad, Amin
Beck, David A. C.
author_sort Komp, Evan
collection PubMed
description Stability of proteins at high temperature has been a topic of interest for many years, as this attribute is favourable for applications ranging from therapeutics to industrial chemical manufacturing. Our current understanding and methods for designing high-temperature stability into target proteins are inadequate. To drive innovation in this space, we have curated a large dataset, learn2thermDB, of protein-temperature examples, totalling 24 million instances, and paired proteins across temperatures based on homology, yielding 69 million protein pairs - orders of magnitude larger than the current largest. This important step of pairing allows for study of high-temperature stability in a sequence-dependent manner in the big data era. The data pipeline is parameterized and open, allowing it to be tuned by downstream users. We further show that the data contains signal for deep learning. This data offers a new doorway towards thermal stability design models.
format Online
Article
Text
id pubmed-10560248
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-105602482023-10-09 Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe Komp, Evan Alanzi, Humood N. Francis, Ryan Vuong, Chau Roberts, Logan Mosallanejad, Amin Beck, David A. C. Sci Data Data Descriptor Stability of proteins at high temperature has been a topic of interest for many years, as this attribute is favourable for applications ranging from therapeutics to industrial chemical manufacturing. Our current understanding and methods for designing high-temperature stability into target proteins are inadequate. To drive innovation in this space, we have curated a large dataset, learn2thermDB, of protein-temperature examples, totalling 24 million instances, and paired proteins across temperatures based on homology, yielding 69 million protein pairs - orders of magnitude larger than the current largest. This important step of pairing allows for study of high-temperature stability in a sequence-dependent manner in the big data era. The data pipeline is parameterized and open, allowing it to be tuned by downstream users. We further show that the data contains signal for deep learning. This data offers a new doorway towards thermal stability design models. Nature Publishing Group UK 2023-10-07 /pmc/articles/PMC10560248/ /pubmed/37805601 http://dx.doi.org/10.1038/s41597-023-02553-w Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Data Descriptor
Komp, Evan
Alanzi, Humood N.
Francis, Ryan
Vuong, Chau
Roberts, Logan
Mosallanejad, Amin
Beck, David A. C.
Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe
title Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe
title_full Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe
title_fullStr Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe
title_full_unstemmed Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe
title_short Homologous Pairs of Low and High Temperature Originating Proteins Spanning the Known Prokaryotic Universe
title_sort homologous pairs of low and high temperature originating proteins spanning the known prokaryotic universe
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10560248/
https://www.ncbi.nlm.nih.gov/pubmed/37805601
http://dx.doi.org/10.1038/s41597-023-02553-w
work_keys_str_mv AT kompevan homologouspairsoflowandhightemperatureoriginatingproteinsspanningtheknownprokaryoticuniverse
AT alanzihumoodn homologouspairsoflowandhightemperatureoriginatingproteinsspanningtheknownprokaryoticuniverse
AT francisryan homologouspairsoflowandhightemperatureoriginatingproteinsspanningtheknownprokaryoticuniverse
AT vuongchau homologouspairsoflowandhightemperatureoriginatingproteinsspanningtheknownprokaryoticuniverse
AT robertslogan homologouspairsoflowandhightemperatureoriginatingproteinsspanningtheknownprokaryoticuniverse
AT mosallanejadamin homologouspairsoflowandhightemperatureoriginatingproteinsspanningtheknownprokaryoticuniverse
AT beckdavidac homologouspairsoflowandhightemperatureoriginatingproteinsspanningtheknownprokaryoticuniverse