Cargando…

Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering

Computational methods in protein engineering often require encoding amino acid sequences, i.e., converting them into numeric arrays. Physicochemical properties are a typical choice to define encoders, where we replace each amino acid by its value for a given property. However, what property (or grou...

Descripción completa

Detalles Bibliográficos
Autores principales:	Medina-Ortiz, David, Contreras, Sebastian, Amado-Hinojosa, Juan, Torres-Almonacid, Jorge, Asenjo, Juan A., Navarrete, Marcelo, Olivera-Nappa, Álvaro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Molecular Biosciences
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9329607/ https://www.ncbi.nlm.nih.gov/pubmed/35911960 http://dx.doi.org/10.3389/fmolb.2022.898627

_version_	1784757954131525632
author	Medina-Ortiz, David Contreras, Sebastian Amado-Hinojosa, Juan Torres-Almonacid, Jorge Asenjo, Juan A. Navarrete, Marcelo Olivera-Nappa, Álvaro
author_facet	Medina-Ortiz, David Contreras, Sebastian Amado-Hinojosa, Juan Torres-Almonacid, Jorge Asenjo, Juan A. Navarrete, Marcelo Olivera-Nappa, Álvaro
author_sort	Medina-Ortiz, David
collection	PubMed
description	Computational methods in protein engineering often require encoding amino acid sequences, i.e., converting them into numeric arrays. Physicochemical properties are a typical choice to define encoders, where we replace each amino acid by its value for a given property. However, what property (or group thereof) is best for a given predictive task remains an open problem. In this work, we generalize property-based encoding strategies to maximize the performance of predictive models in protein engineering. First, combining text mining and unsupervised learning, we partitioned the AAIndex database into eight semantically-consistent groups of properties. We then applied a non-linear PCA within each group to define a single encoder to represent it. Then, in several case studies, we assess the performance of predictive models for protein and peptide function, folding, and biological activity, trained using the proposed encoders and classical methods (One Hot Encoder and TAPE embeddings). Models trained on datasets encoded with our encoders and converted to signals through the Fast Fourier Transform (FFT) increased their precision and reduced their overfitting substantially, outperforming classical approaches in most cases. Finally, we propose a preliminary methodology to create de novo sequences with desired properties. All these results offer simple ways to increase the performance of general and complex predictive tasks in protein engineering without increasing their complexity.
format	Online Article Text
id	pubmed-9329607
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-93296072022-07-29 Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering Medina-Ortiz, David Contreras, Sebastian Amado-Hinojosa, Juan Torres-Almonacid, Jorge Asenjo, Juan A. Navarrete, Marcelo Olivera-Nappa, Álvaro Front Mol Biosci Molecular Biosciences Computational methods in protein engineering often require encoding amino acid sequences, i.e., converting them into numeric arrays. Physicochemical properties are a typical choice to define encoders, where we replace each amino acid by its value for a given property. However, what property (or group thereof) is best for a given predictive task remains an open problem. In this work, we generalize property-based encoding strategies to maximize the performance of predictive models in protein engineering. First, combining text mining and unsupervised learning, we partitioned the AAIndex database into eight semantically-consistent groups of properties. We then applied a non-linear PCA within each group to define a single encoder to represent it. Then, in several case studies, we assess the performance of predictive models for protein and peptide function, folding, and biological activity, trained using the proposed encoders and classical methods (One Hot Encoder and TAPE embeddings). Models trained on datasets encoded with our encoders and converted to signals through the Fast Fourier Transform (FFT) increased their precision and reduced their overfitting substantially, outperforming classical approaches in most cases. Finally, we propose a preliminary methodology to create de novo sequences with desired properties. All these results offer simple ways to increase the performance of general and complex predictive tasks in protein engineering without increasing their complexity. Frontiers Media S.A. 2022-07-14 /pmc/articles/PMC9329607/ /pubmed/35911960 http://dx.doi.org/10.3389/fmolb.2022.898627 Text en Copyright © 2022 Medina-Ortiz, Contreras, Amado-Hinojosa, Torres-Almonacid, Asenjo, Navarrete and Olivera-Nappa. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Molecular Biosciences Medina-Ortiz, David Contreras, Sebastian Amado-Hinojosa, Juan Torres-Almonacid, Jorge Asenjo, Juan A. Navarrete, Marcelo Olivera-Nappa, Álvaro Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering
title	Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering
title_full	Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering
title_fullStr	Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering
title_full_unstemmed	Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering
title_short	Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering
title_sort	generalized property-based encoders and digital signal processing facilitate predictive tasks in protein engineering
topic	Molecular Biosciences
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9329607/ https://www.ncbi.nlm.nih.gov/pubmed/35911960 http://dx.doi.org/10.3389/fmolb.2022.898627
work_keys_str_mv	AT medinaortizdavid generalizedpropertybasedencodersanddigitalsignalprocessingfacilitatepredictivetasksinproteinengineering AT contrerassebastian generalizedpropertybasedencodersanddigitalsignalprocessingfacilitatepredictivetasksinproteinengineering AT amadohinojosajuan generalizedpropertybasedencodersanddigitalsignalprocessingfacilitatepredictivetasksinproteinengineering AT torresalmonacidjorge generalizedpropertybasedencodersanddigitalsignalprocessingfacilitatepredictivetasksinproteinengineering AT asenjojuana generalizedpropertybasedencodersanddigitalsignalprocessingfacilitatepredictivetasksinproteinengineering AT navarretemarcelo generalizedpropertybasedencodersanddigitalsignalprocessingfacilitatepredictivetasksinproteinengineering AT oliveranappaalvaro generalizedpropertybasedencodersanddigitalsignalprocessingfacilitatepredictivetasksinproteinengineering

Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering

Ejemplares similares