Cargando…

Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging

The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown...

Descripción completa

Detalles Bibliográficos
Autores principales: Fernandes, Armando, Vinga, Susana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4775025/
https://www.ncbi.nlm.nih.gov/pubmed/26934190
http://dx.doi.org/10.1371/journal.pone.0150369
_version_ 1782419010195816448
author Fernandes, Armando
Vinga, Susana
author_facet Fernandes, Armando
Vinga, Susana
author_sort Fernandes, Armando
collection PubMed
description The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems.
format Online
Article
Text
id pubmed-4775025
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47750252016-03-10 Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging Fernandes, Armando Vinga, Susana PLoS One Research Article The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems. Public Library of Science 2016-03-02 /pmc/articles/PMC4775025/ /pubmed/26934190 http://dx.doi.org/10.1371/journal.pone.0150369 Text en © 2016 Fernandes, Vinga http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Fernandes, Armando
Vinga, Susana
Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging
title Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging
title_full Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging
title_fullStr Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging
title_full_unstemmed Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging
title_short Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging
title_sort improving protein expression prediction using extra features and ensemble averaging
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4775025/
https://www.ncbi.nlm.nih.gov/pubmed/26934190
http://dx.doi.org/10.1371/journal.pone.0150369
work_keys_str_mv AT fernandesarmando improvingproteinexpressionpredictionusingextrafeaturesandensembleaveraging
AT vingasusana improvingproteinexpressionpredictionusingextrafeaturesandensembleaveraging