Cargando…

ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features

Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer’s disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to a...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Ting, Chen, Jia-Mao, Zhang, Dan, Zhang, Qian, Peng, Bowen, Xu, Lei, Tang, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820372/
https://www.ncbi.nlm.nih.gov/pubmed/33490085
http://dx.doi.org/10.3389/fcell.2020.621144
_version_ 1783639196325380096
author Liu, Ting
Chen, Jia-Mao
Zhang, Dan
Zhang, Qian
Peng, Bowen
Xu, Lei
Tang, Hua
author_facet Liu, Ting
Chen, Jia-Mao
Zhang, Dan
Zhang, Qian
Peng, Bowen
Xu, Lei
Tang, Hua
author_sort Liu, Ting
collection PubMed
description Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer’s disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to accurately identify and classify apolipoproteins. Although it is possible to identify apolipoproteins accurately through biochemical experiments, they are expensive and time-consuming. This work aims to establish a high-efficiency and high-accuracy prediction model for recognition of apolipoproteins and their subfamilies. We firstly constructed a high-quality benchmark dataset including 270 apolipoproteins and 535 non-apolipoproteins. Based on the dataset, pseudo-amino acid composition (PseAAC) and composition of k-spaced amino acid pairs (CKSAAP) were used as input vectors. To improve the prediction accuracy and eliminate redundant information, analysis of variance (ANOVA) was used to rank the features. And the incremental feature selection was utilized to obtain the best feature subset. Support vector machine (SVM) was proposed to construct the classification model, which could produce the accuracy of 97.27%, sensitivity of 96.30%, and specificity of 97.76% for discriminating apolipoprotein from non-apolipoprotein in 10-fold cross-validation. In addition, the same process was repeated to generate a new model for predicting apolipoprotein subfamilies. The new model could achieve an overall accuracy of 95.93% in 10-fold cross-validation. According to our proposed model, a convenient webserver called ApoPred was established, which can be freely accessed at http://tang-biolab.com/server/ApoPred/service.html. We expect that this work will contribute to apolipoprotein function research and drug development in relevant diseases.
format Online
Article
Text
id pubmed-7820372
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78203722021-01-23 ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features Liu, Ting Chen, Jia-Mao Zhang, Dan Zhang, Qian Peng, Bowen Xu, Lei Tang, Hua Front Cell Dev Biol Cell and Developmental Biology Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer’s disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to accurately identify and classify apolipoproteins. Although it is possible to identify apolipoproteins accurately through biochemical experiments, they are expensive and time-consuming. This work aims to establish a high-efficiency and high-accuracy prediction model for recognition of apolipoproteins and their subfamilies. We firstly constructed a high-quality benchmark dataset including 270 apolipoproteins and 535 non-apolipoproteins. Based on the dataset, pseudo-amino acid composition (PseAAC) and composition of k-spaced amino acid pairs (CKSAAP) were used as input vectors. To improve the prediction accuracy and eliminate redundant information, analysis of variance (ANOVA) was used to rank the features. And the incremental feature selection was utilized to obtain the best feature subset. Support vector machine (SVM) was proposed to construct the classification model, which could produce the accuracy of 97.27%, sensitivity of 96.30%, and specificity of 97.76% for discriminating apolipoprotein from non-apolipoprotein in 10-fold cross-validation. In addition, the same process was repeated to generate a new model for predicting apolipoprotein subfamilies. The new model could achieve an overall accuracy of 95.93% in 10-fold cross-validation. According to our proposed model, a convenient webserver called ApoPred was established, which can be freely accessed at http://tang-biolab.com/server/ApoPred/service.html. We expect that this work will contribute to apolipoprotein function research and drug development in relevant diseases. Frontiers Media S.A. 2021-01-08 /pmc/articles/PMC7820372/ /pubmed/33490085 http://dx.doi.org/10.3389/fcell.2020.621144 Text en Copyright © 2021 Liu, Chen, Zhang, Zhang, Peng, Xu and Tang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Cell and Developmental Biology
Liu, Ting
Chen, Jia-Mao
Zhang, Dan
Zhang, Qian
Peng, Bowen
Xu, Lei
Tang, Hua
ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_full ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_fullStr ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_full_unstemmed ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_short ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_sort apopred: identification of apolipoproteins and their subfamilies with multifarious features
topic Cell and Developmental Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820372/
https://www.ncbi.nlm.nih.gov/pubmed/33490085
http://dx.doi.org/10.3389/fcell.2020.621144
work_keys_str_mv AT liuting apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures
AT chenjiamao apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures
AT zhangdan apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures
AT zhangqian apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures
AT pengbowen apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures
AT xulei apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures
AT tanghua apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures