Cargando…

ManyFold: an efficient and flexible library for training and validating protein folding models

SUMMARY: ManyFold is a flexible library for protein structure prediction with deep learning that (i) supports models that use both multiple sequence alignments (MSAs) and protein language model (pLM) embedding as inputs, (ii) allows inference of existing models (AlphaFold and OpenFold), (iii) is ful...

Descripción completa

Detalles Bibliográficos
Autores principales: Villegas-Morcillo, Amelia, Robinson, Louis, Flajolet, Arthur, Barrett, Thomas D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825755/
https://www.ncbi.nlm.nih.gov/pubmed/36495196
http://dx.doi.org/10.1093/bioinformatics/btac773
_version_ 1784866691139764224
author Villegas-Morcillo, Amelia
Robinson, Louis
Flajolet, Arthur
Barrett, Thomas D
author_facet Villegas-Morcillo, Amelia
Robinson, Louis
Flajolet, Arthur
Barrett, Thomas D
author_sort Villegas-Morcillo, Amelia
collection PubMed
description SUMMARY: ManyFold is a flexible library for protein structure prediction with deep learning that (i) supports models that use both multiple sequence alignments (MSAs) and protein language model (pLM) embedding as inputs, (ii) allows inference of existing models (AlphaFold and OpenFold), (iii) is fully trainable, allowing for both fine-tuning and the training of new models from scratch and (iv) is written in Jax to support efficient batched operation in distributed settings. A proof-of-concept pLM-based model, pLMFold, is trained from scratch to obtain reasonable results with reduced computational overheads in comparison to AlphaFold. AVAILABILITY AND IMPLEMENTATION: The source code for ManyFold, the validation dataset and a small sample of training data are available at https://github.com/instadeepai/manyfold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9825755
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98257552023-01-10 ManyFold: an efficient and flexible library for training and validating protein folding models Villegas-Morcillo, Amelia Robinson, Louis Flajolet, Arthur Barrett, Thomas D Bioinformatics Applications Note SUMMARY: ManyFold is a flexible library for protein structure prediction with deep learning that (i) supports models that use both multiple sequence alignments (MSAs) and protein language model (pLM) embedding as inputs, (ii) allows inference of existing models (AlphaFold and OpenFold), (iii) is fully trainable, allowing for both fine-tuning and the training of new models from scratch and (iv) is written in Jax to support efficient batched operation in distributed settings. A proof-of-concept pLM-based model, pLMFold, is trained from scratch to obtain reasonable results with reduced computational overheads in comparison to AlphaFold. AVAILABILITY AND IMPLEMENTATION: The source code for ManyFold, the validation dataset and a small sample of training data are available at https://github.com/instadeepai/manyfold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-12-10 /pmc/articles/PMC9825755/ /pubmed/36495196 http://dx.doi.org/10.1093/bioinformatics/btac773 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Villegas-Morcillo, Amelia
Robinson, Louis
Flajolet, Arthur
Barrett, Thomas D
ManyFold: an efficient and flexible library for training and validating protein folding models
title ManyFold: an efficient and flexible library for training and validating protein folding models
title_full ManyFold: an efficient and flexible library for training and validating protein folding models
title_fullStr ManyFold: an efficient and flexible library for training and validating protein folding models
title_full_unstemmed ManyFold: an efficient and flexible library for training and validating protein folding models
title_short ManyFold: an efficient and flexible library for training and validating protein folding models
title_sort manyfold: an efficient and flexible library for training and validating protein folding models
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825755/
https://www.ncbi.nlm.nih.gov/pubmed/36495196
http://dx.doi.org/10.1093/bioinformatics/btac773
work_keys_str_mv AT villegasmorcilloamelia manyfoldanefficientandflexiblelibraryfortrainingandvalidatingproteinfoldingmodels
AT robinsonlouis manyfoldanefficientandflexiblelibraryfortrainingandvalidatingproteinfoldingmodels
AT flajoletarthur manyfoldanefficientandflexiblelibraryfortrainingandvalidatingproteinfoldingmodels
AT barrettthomasd manyfoldanefficientandflexiblelibraryfortrainingandvalidatingproteinfoldingmodels