Cargando…
A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features
The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to ach...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214540/ https://www.ncbi.nlm.nih.gov/pubmed/32432088 http://dx.doi.org/10.3389/fbioe.2020.00285 |
_version_ | 1783531995690696704 |
---|---|
author | Feng, Changli Ma, Zhaogui Yang, Deyun Li, Xin Zhang, Jun Li, Yanjuan |
author_facet | Feng, Changli Ma, Zhaogui Yang, Deyun Li, Xin Zhang, Jun Li, Yanjuan |
author_sort | Feng, Changli |
collection | PubMed |
description | The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design. |
format | Online Article Text |
id | pubmed-7214540 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-72145402020-05-19 A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features Feng, Changli Ma, Zhaogui Yang, Deyun Li, Xin Zhang, Jun Li, Yanjuan Front Bioeng Biotechnol Bioengineering and Biotechnology The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design. Frontiers Media S.A. 2020-05-05 /pmc/articles/PMC7214540/ /pubmed/32432088 http://dx.doi.org/10.3389/fbioe.2020.00285 Text en Copyright © 2020 Feng, Ma, Yang, Li, Zhang and Li. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioengineering and Biotechnology Feng, Changli Ma, Zhaogui Yang, Deyun Li, Xin Zhang, Jun Li, Yanjuan A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features |
title | A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features |
title_full | A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features |
title_fullStr | A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features |
title_full_unstemmed | A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features |
title_short | A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features |
title_sort | method for prediction of thermophilic protein based on reduced amino acids and mixed features |
topic | Bioengineering and Biotechnology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214540/ https://www.ncbi.nlm.nih.gov/pubmed/32432088 http://dx.doi.org/10.3389/fbioe.2020.00285 |
work_keys_str_mv | AT fengchangli amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT mazhaogui amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT yangdeyun amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT lixin amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT zhangjun amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT liyanjuan amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT fengchangli methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT mazhaogui methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT yangdeyun methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT lixin methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT zhangjun methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures AT liyanjuan methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures |