Cargando…

Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets

[Image: see text] Self-assembling peptide nanostructures have been shown to be of great importance in nature and have presented many promising applications, for example, in medicine as drug-delivery vehicles, biosensors, and antivirals. Being very promising candidates for the growing field of bottom...

Descripción completa

Detalles Bibliográficos
Autores principales: van Teijlingen, Alexander, Tuttle, Tell
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2021
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8278388/
https://www.ncbi.nlm.nih.gov/pubmed/33904712
http://dx.doi.org/10.1021/acs.jctc.1c00159
_version_ 1783722249882173440
author van Teijlingen, Alexander
Tuttle, Tell
author_facet van Teijlingen, Alexander
Tuttle, Tell
author_sort van Teijlingen, Alexander
collection PubMed
description [Image: see text] Self-assembling peptide nanostructures have been shown to be of great importance in nature and have presented many promising applications, for example, in medicine as drug-delivery vehicles, biosensors, and antivirals. Being very promising candidates for the growing field of bottom-up manufacture of functional nanomaterials, previous work (Frederix, et al. 2011 and 2015) has screened all possible amino acid combinations for di- and tripeptides in search of such materials. However, the enormous complexity and variety of linear combinations of the 20 amino acids make exhaustive simulation of all combinations of tetrapeptides and above infeasible. Therefore, we have developed an active machine-learning method (also known as “iterative learning” and “evolutionary search method”) which leverages a lower-resolution data set encompassing the whole search space and a just-in-time high-resolution data set which further analyzes those target peptides selected by the lower-resolution model. This model uses newly generated data upon each iteration to improve both lower- and higher-resolution models in the search for ideal candidates. Curation of the lower-resolution data set is explored as a method to control the selected candidates, based on criteria such as log P. A major aim of this method is to produce the best results in the least computationally demanding way. This model has been developed to be broadly applicable to other search spaces with minor changes to the algorithm, allowing its use in other areas of research.
format Online
Article
Text
id pubmed-8278388
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-82783882021-07-14 Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets van Teijlingen, Alexander Tuttle, Tell J Chem Theory Comput [Image: see text] Self-assembling peptide nanostructures have been shown to be of great importance in nature and have presented many promising applications, for example, in medicine as drug-delivery vehicles, biosensors, and antivirals. Being very promising candidates for the growing field of bottom-up manufacture of functional nanomaterials, previous work (Frederix, et al. 2011 and 2015) has screened all possible amino acid combinations for di- and tripeptides in search of such materials. However, the enormous complexity and variety of linear combinations of the 20 amino acids make exhaustive simulation of all combinations of tetrapeptides and above infeasible. Therefore, we have developed an active machine-learning method (also known as “iterative learning” and “evolutionary search method”) which leverages a lower-resolution data set encompassing the whole search space and a just-in-time high-resolution data set which further analyzes those target peptides selected by the lower-resolution model. This model uses newly generated data upon each iteration to improve both lower- and higher-resolution models in the search for ideal candidates. Curation of the lower-resolution data set is explored as a method to control the selected candidates, based on criteria such as log P. A major aim of this method is to produce the best results in the least computationally demanding way. This model has been developed to be broadly applicable to other search spaces with minor changes to the algorithm, allowing its use in other areas of research. American Chemical Society 2021-04-27 2021-05-11 /pmc/articles/PMC8278388/ /pubmed/33904712 http://dx.doi.org/10.1021/acs.jctc.1c00159 Text en © 2021 The Authors. Published by American Chemical Society Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle van Teijlingen, Alexander
Tuttle, Tell
Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets
title Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets
title_full Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets
title_fullStr Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets
title_full_unstemmed Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets
title_short Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets
title_sort beyond tripeptides two-step active machine learning for very large data sets
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8278388/
https://www.ncbi.nlm.nih.gov/pubmed/33904712
http://dx.doi.org/10.1021/acs.jctc.1c00159
work_keys_str_mv AT vanteijlingenalexander beyondtripeptidestwostepactivemachinelearningforverylargedatasets
AT tuttletell beyondtripeptidestwostepactivemachinelearningforverylargedatasets