Cargando…

A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features

The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to ach...

Descripción completa

Detalles Bibliográficos
Autores principales: Feng, Changli, Ma, Zhaogui, Yang, Deyun, Li, Xin, Zhang, Jun, Li, Yanjuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214540/
https://www.ncbi.nlm.nih.gov/pubmed/32432088
http://dx.doi.org/10.3389/fbioe.2020.00285
_version_ 1783531995690696704
author Feng, Changli
Ma, Zhaogui
Yang, Deyun
Li, Xin
Zhang, Jun
Li, Yanjuan
author_facet Feng, Changli
Ma, Zhaogui
Yang, Deyun
Li, Xin
Zhang, Jun
Li, Yanjuan
author_sort Feng, Changli
collection PubMed
description The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design.
format Online
Article
Text
id pubmed-7214540
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-72145402020-05-19 A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features Feng, Changli Ma, Zhaogui Yang, Deyun Li, Xin Zhang, Jun Li, Yanjuan Front Bioeng Biotechnol Bioengineering and Biotechnology The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design. Frontiers Media S.A. 2020-05-05 /pmc/articles/PMC7214540/ /pubmed/32432088 http://dx.doi.org/10.3389/fbioe.2020.00285 Text en Copyright © 2020 Feng, Ma, Yang, Li, Zhang and Li. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
Feng, Changli
Ma, Zhaogui
Yang, Deyun
Li, Xin
Zhang, Jun
Li, Yanjuan
A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features
title A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features
title_full A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features
title_fullStr A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features
title_full_unstemmed A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features
title_short A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features
title_sort method for prediction of thermophilic protein based on reduced amino acids and mixed features
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214540/
https://www.ncbi.nlm.nih.gov/pubmed/32432088
http://dx.doi.org/10.3389/fbioe.2020.00285
work_keys_str_mv AT fengchangli amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT mazhaogui amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT yangdeyun amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT lixin amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT zhangjun amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT liyanjuan amethodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT fengchangli methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT mazhaogui methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT yangdeyun methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT lixin methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT zhangjun methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures
AT liyanjuan methodforpredictionofthermophilicproteinbasedonreducedaminoacidsandmixedfeatures