Cargando…

ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features

Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer’s disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Ting, Chen, Jia-Mao, Zhang, Dan, Zhang, Qian, Peng, Bowen, Xu, Lei, Tang, Hua
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Cell and Developmental Biology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820372/ https://www.ncbi.nlm.nih.gov/pubmed/33490085 http://dx.doi.org/10.3389/fcell.2020.621144

_version_	1783639196325380096
author	Liu, Ting Chen, Jia-Mao Zhang, Dan Zhang, Qian Peng, Bowen Xu, Lei Tang, Hua
author_facet	Liu, Ting Chen, Jia-Mao Zhang, Dan Zhang, Qian Peng, Bowen Xu, Lei Tang, Hua
author_sort	Liu, Ting
collection	PubMed
description	Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer’s disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to accurately identify and classify apolipoproteins. Although it is possible to identify apolipoproteins accurately through biochemical experiments, they are expensive and time-consuming. This work aims to establish a high-efficiency and high-accuracy prediction model for recognition of apolipoproteins and their subfamilies. We firstly constructed a high-quality benchmark dataset including 270 apolipoproteins and 535 non-apolipoproteins. Based on the dataset, pseudo-amino acid composition (PseAAC) and composition of k-spaced amino acid pairs (CKSAAP) were used as input vectors. To improve the prediction accuracy and eliminate redundant information, analysis of variance (ANOVA) was used to rank the features. And the incremental feature selection was utilized to obtain the best feature subset. Support vector machine (SVM) was proposed to construct the classification model, which could produce the accuracy of 97.27%, sensitivity of 96.30%, and specificity of 97.76% for discriminating apolipoprotein from non-apolipoprotein in 10-fold cross-validation. In addition, the same process was repeated to generate a new model for predicting apolipoprotein subfamilies. The new model could achieve an overall accuracy of 95.93% in 10-fold cross-validation. According to our proposed model, a convenient webserver called ApoPred was established, which can be freely accessed at http://tang-biolab.com/server/ApoPred/service.html. We expect that this work will contribute to apolipoprotein function research and drug development in relevant diseases.
format	Online Article Text
id	pubmed-7820372
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-78203722021-01-23 ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features Liu, Ting Chen, Jia-Mao Zhang, Dan Zhang, Qian Peng, Bowen Xu, Lei Tang, Hua Front Cell Dev Biol Cell and Developmental Biology Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer’s disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to accurately identify and classify apolipoproteins. Although it is possible to identify apolipoproteins accurately through biochemical experiments, they are expensive and time-consuming. This work aims to establish a high-efficiency and high-accuracy prediction model for recognition of apolipoproteins and their subfamilies. We firstly constructed a high-quality benchmark dataset including 270 apolipoproteins and 535 non-apolipoproteins. Based on the dataset, pseudo-amino acid composition (PseAAC) and composition of k-spaced amino acid pairs (CKSAAP) were used as input vectors. To improve the prediction accuracy and eliminate redundant information, analysis of variance (ANOVA) was used to rank the features. And the incremental feature selection was utilized to obtain the best feature subset. Support vector machine (SVM) was proposed to construct the classification model, which could produce the accuracy of 97.27%, sensitivity of 96.30%, and specificity of 97.76% for discriminating apolipoprotein from non-apolipoprotein in 10-fold cross-validation. In addition, the same process was repeated to generate a new model for predicting apolipoprotein subfamilies. The new model could achieve an overall accuracy of 95.93% in 10-fold cross-validation. According to our proposed model, a convenient webserver called ApoPred was established, which can be freely accessed at http://tang-biolab.com/server/ApoPred/service.html. We expect that this work will contribute to apolipoprotein function research and drug development in relevant diseases. Frontiers Media S.A. 2021-01-08 /pmc/articles/PMC7820372/ /pubmed/33490085 http://dx.doi.org/10.3389/fcell.2020.621144 Text en Copyright © 2021 Liu, Chen, Zhang, Zhang, Peng, Xu and Tang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Cell and Developmental Biology Liu, Ting Chen, Jia-Mao Zhang, Dan Zhang, Qian Peng, Bowen Xu, Lei Tang, Hua ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title	ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_full	ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_fullStr	ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_full_unstemmed	ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_short	ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
title_sort	apopred: identification of apolipoproteins and their subfamilies with multifarious features
topic	Cell and Developmental Biology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820372/ https://www.ncbi.nlm.nih.gov/pubmed/33490085 http://dx.doi.org/10.3389/fcell.2020.621144
work_keys_str_mv	AT liuting apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT chenjiamao apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT zhangdan apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT zhangqian apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT pengbowen apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT xulei apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT tanghua apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures

ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features

Ejemplares similares