Cargando…

TRILL: ORCHESTRATING MODULAR DEEP-LEARNING WORKFLOWS FOR DEMOCRATIZED, SCALABLE PROTEIN ANALYSIS AND ENGINEERING

Deep-learning models have been rapidly adopted by many fields, partly due to the deluge of data humanity has amassed. In particular, the petabases of biological sequencing data enable the unsupervised training of protein language models that learn the “language of life.” However, due to their prohib...

Descripción completa

Detalles Bibliográficos
Autores principales: Martinez, Zachary A, Murray, Richard M., Thomson, Matt W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659302/
https://www.ncbi.nlm.nih.gov/pubmed/37986952
http://dx.doi.org/10.1101/2023.10.24.563881
_version_ 1785148305679843328
author Martinez, Zachary A
Murray, Richard M.
Thomson, Matt W.
author_facet Martinez, Zachary A
Murray, Richard M.
Thomson, Matt W.
author_sort Martinez, Zachary A
collection PubMed
description Deep-learning models have been rapidly adopted by many fields, partly due to the deluge of data humanity has amassed. In particular, the petabases of biological sequencing data enable the unsupervised training of protein language models that learn the “language of life.” However, due to their prohibitive size and complexity, contemporary deep-learning models are often unwieldy, especially for scientists with limited machine learning backgrounds. TRILL (TRaining and Inference using the Language of Life) is a platform for creative protein design and discovery. Leveraging several state-of-the-art models such as ESM-2, DiffDock, and RFDiffusion, TRILL allows researchers to generate novel proteins, predict 3-D structures, extract high-dimensional representations of proteins, functionally classify proteins and more. What sets TRILL apart is its ability to enable complex pipelines by chaining together models and effectively merging the capabilities of different models to achieve a sum greater than its individual parts. Whether using Google Colab with one GPU or a supercomputer with hundreds, TRILL allows scientists to effectively utilize models with millions to billions of parameters by using optimized training strategies such as ZeRO-Offload and distributed data parallel. Therefore, TRILL not only bridges the gap between complex deep-learning models and their practical application in the field of biology, but also simplifies the orchestration of these models into comprehensive workflows, democratizing access to powerful methods. Documentation: https://trill.readthedocs.io/en/latest/home.html.
format Online
Article
Text
id pubmed-10659302
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-106593022023-11-20 TRILL: ORCHESTRATING MODULAR DEEP-LEARNING WORKFLOWS FOR DEMOCRATIZED, SCALABLE PROTEIN ANALYSIS AND ENGINEERING Martinez, Zachary A Murray, Richard M. Thomson, Matt W. bioRxiv Article Deep-learning models have been rapidly adopted by many fields, partly due to the deluge of data humanity has amassed. In particular, the petabases of biological sequencing data enable the unsupervised training of protein language models that learn the “language of life.” However, due to their prohibitive size and complexity, contemporary deep-learning models are often unwieldy, especially for scientists with limited machine learning backgrounds. TRILL (TRaining and Inference using the Language of Life) is a platform for creative protein design and discovery. Leveraging several state-of-the-art models such as ESM-2, DiffDock, and RFDiffusion, TRILL allows researchers to generate novel proteins, predict 3-D structures, extract high-dimensional representations of proteins, functionally classify proteins and more. What sets TRILL apart is its ability to enable complex pipelines by chaining together models and effectively merging the capabilities of different models to achieve a sum greater than its individual parts. Whether using Google Colab with one GPU or a supercomputer with hundreds, TRILL allows scientists to effectively utilize models with millions to billions of parameters by using optimized training strategies such as ZeRO-Offload and distributed data parallel. Therefore, TRILL not only bridges the gap between complex deep-learning models and their practical application in the field of biology, but also simplifies the orchestration of these models into comprehensive workflows, democratizing access to powerful methods. Documentation: https://trill.readthedocs.io/en/latest/home.html. Cold Spring Harbor Laboratory 2023-11-10 /pmc/articles/PMC10659302/ /pubmed/37986952 http://dx.doi.org/10.1101/2023.10.24.563881 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Martinez, Zachary A
Murray, Richard M.
Thomson, Matt W.
TRILL: ORCHESTRATING MODULAR DEEP-LEARNING WORKFLOWS FOR DEMOCRATIZED, SCALABLE PROTEIN ANALYSIS AND ENGINEERING
title TRILL: ORCHESTRATING MODULAR DEEP-LEARNING WORKFLOWS FOR DEMOCRATIZED, SCALABLE PROTEIN ANALYSIS AND ENGINEERING
title_full TRILL: ORCHESTRATING MODULAR DEEP-LEARNING WORKFLOWS FOR DEMOCRATIZED, SCALABLE PROTEIN ANALYSIS AND ENGINEERING
title_fullStr TRILL: ORCHESTRATING MODULAR DEEP-LEARNING WORKFLOWS FOR DEMOCRATIZED, SCALABLE PROTEIN ANALYSIS AND ENGINEERING
title_full_unstemmed TRILL: ORCHESTRATING MODULAR DEEP-LEARNING WORKFLOWS FOR DEMOCRATIZED, SCALABLE PROTEIN ANALYSIS AND ENGINEERING
title_short TRILL: ORCHESTRATING MODULAR DEEP-LEARNING WORKFLOWS FOR DEMOCRATIZED, SCALABLE PROTEIN ANALYSIS AND ENGINEERING
title_sort trill: orchestrating modular deep-learning workflows for democratized, scalable protein analysis and engineering
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659302/
https://www.ncbi.nlm.nih.gov/pubmed/37986952
http://dx.doi.org/10.1101/2023.10.24.563881
work_keys_str_mv AT martinezzacharya trillorchestratingmodulardeeplearningworkflowsfordemocratizedscalableproteinanalysisandengineering
AT murrayrichardm trillorchestratingmodulardeeplearningworkflowsfordemocratizedscalableproteinanalysisandengineering
AT thomsonmattw trillorchestratingmodulardeeplearningworkflowsfordemocratizedscalableproteinanalysisandengineering