Cargando…

QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding

Many proteins exist in natures as oligomers with various quaternary structural attributes rather than as single chains. Predicting these attributes is an essential task in computational biology for the advancement of proteomics. However, the existing methods do not consider the integration of hetero...

Descripción completa

Detalles Bibliográficos
Autores principales: Tung, Chi-Hua, Chien, Ching-Hsuan, Chen, Chi-Wei, Huang, Lan-Ying, Liu, Yu-Nan, Chu, Yen-Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7190164/
https://www.ncbi.nlm.nih.gov/pubmed/32348325
http://dx.doi.org/10.1371/journal.pone.0232087
_version_ 1783527636907065344
author Tung, Chi-Hua
Chien, Ching-Hsuan
Chen, Chi-Wei
Huang, Lan-Ying
Liu, Yu-Nan
Chu, Yen-Wei
author_facet Tung, Chi-Hua
Chien, Ching-Hsuan
Chen, Chi-Wei
Huang, Lan-Ying
Liu, Yu-Nan
Chu, Yen-Wei
author_sort Tung, Chi-Hua
collection PubMed
description Many proteins exist in natures as oligomers with various quaternary structural attributes rather than as single chains. Predicting these attributes is an essential task in computational biology for the advancement of proteomics. However, the existing methods do not consider the integration of heterogeneous coding and the accuracy of subunit categories with limited data. To this end, we proposed a tool that can predict more than 12 subunit protein oligomers, QUATgo. Meanwhile, three kinds of sequence coding were used, including dipeptide composition, which was used for the first time to predict protein quaternary structural attributes, and protein half-life characteristics, and we modified the coding method of the functional domain composition proposed by predecessors to solve the problem of large feature vectors. QUATgo solves the problem of insufficient data for a single subunit using a two-stage architecture and uses 10-fold cross-validation to test the predictive accuracy of the classifier. QUATgo has 49.0% cross-validation accuracy and 31.1% independent test accuracy. In the case study, the accuracy of QUATgo can reach 61.5% for predicting the quaternary structure of influenza virus hemagglutinin proteins. Finally, QUATgo is freely accessible to the public as a web server via the site http://predictor.nchu.edu.tw/QUATgo.
format Online
Article
Text
id pubmed-7190164
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71901642020-05-06 QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding Tung, Chi-Hua Chien, Ching-Hsuan Chen, Chi-Wei Huang, Lan-Ying Liu, Yu-Nan Chu, Yen-Wei PLoS One Research Article Many proteins exist in natures as oligomers with various quaternary structural attributes rather than as single chains. Predicting these attributes is an essential task in computational biology for the advancement of proteomics. However, the existing methods do not consider the integration of heterogeneous coding and the accuracy of subunit categories with limited data. To this end, we proposed a tool that can predict more than 12 subunit protein oligomers, QUATgo. Meanwhile, three kinds of sequence coding were used, including dipeptide composition, which was used for the first time to predict protein quaternary structural attributes, and protein half-life characteristics, and we modified the coding method of the functional domain composition proposed by predecessors to solve the problem of large feature vectors. QUATgo solves the problem of insufficient data for a single subunit using a two-stage architecture and uses 10-fold cross-validation to test the predictive accuracy of the classifier. QUATgo has 49.0% cross-validation accuracy and 31.1% independent test accuracy. In the case study, the accuracy of QUATgo can reach 61.5% for predicting the quaternary structure of influenza virus hemagglutinin proteins. Finally, QUATgo is freely accessible to the public as a web server via the site http://predictor.nchu.edu.tw/QUATgo. Public Library of Science 2020-04-29 /pmc/articles/PMC7190164/ /pubmed/32348325 http://dx.doi.org/10.1371/journal.pone.0232087 Text en © 2020 Tung et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Tung, Chi-Hua
Chien, Ching-Hsuan
Chen, Chi-Wei
Huang, Lan-Ying
Liu, Yu-Nan
Chu, Yen-Wei
QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding
title QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding
title_full QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding
title_fullStr QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding
title_full_unstemmed QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding
title_short QUATgo: Protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding
title_sort quatgo: protein quaternary structural attributes predicted by two-stage machine learning approaches with heterogeneous feature encoding
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7190164/
https://www.ncbi.nlm.nih.gov/pubmed/32348325
http://dx.doi.org/10.1371/journal.pone.0232087
work_keys_str_mv AT tungchihua quatgoproteinquaternarystructuralattributespredictedbytwostagemachinelearningapproacheswithheterogeneousfeatureencoding
AT chienchinghsuan quatgoproteinquaternarystructuralattributespredictedbytwostagemachinelearningapproacheswithheterogeneousfeatureencoding
AT chenchiwei quatgoproteinquaternarystructuralattributespredictedbytwostagemachinelearningapproacheswithheterogeneousfeatureencoding
AT huanglanying quatgoproteinquaternarystructuralattributespredictedbytwostagemachinelearningapproacheswithheterogeneousfeatureencoding
AT liuyunan quatgoproteinquaternarystructuralattributespredictedbytwostagemachinelearningapproacheswithheterogeneousfeatureencoding
AT chuyenwei quatgoproteinquaternarystructuralattributespredictedbytwostagemachinelearningapproacheswithheterogeneousfeatureencoding