Cargando…
Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging
The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4775025/ https://www.ncbi.nlm.nih.gov/pubmed/26934190 http://dx.doi.org/10.1371/journal.pone.0150369 |
_version_ | 1782419010195816448 |
---|---|
author | Fernandes, Armando Vinga, Susana |
author_facet | Fernandes, Armando Vinga, Susana |
author_sort | Fernandes, Armando |
collection | PubMed |
description | The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems. |
format | Online Article Text |
id | pubmed-4775025 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-47750252016-03-10 Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging Fernandes, Armando Vinga, Susana PLoS One Research Article The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems. Public Library of Science 2016-03-02 /pmc/articles/PMC4775025/ /pubmed/26934190 http://dx.doi.org/10.1371/journal.pone.0150369 Text en © 2016 Fernandes, Vinga http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Fernandes, Armando Vinga, Susana Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging |
title | Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging |
title_full | Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging |
title_fullStr | Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging |
title_full_unstemmed | Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging |
title_short | Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging |
title_sort | improving protein expression prediction using extra features and ensemble averaging |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4775025/ https://www.ncbi.nlm.nih.gov/pubmed/26934190 http://dx.doi.org/10.1371/journal.pone.0150369 |
work_keys_str_mv | AT fernandesarmando improvingproteinexpressionpredictionusingextrafeaturesandensembleaveraging AT vingasusana improvingproteinexpressionpredictionusingextrafeaturesandensembleaveraging |