Cargando…
ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features
Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer’s disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to a...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820372/ https://www.ncbi.nlm.nih.gov/pubmed/33490085 http://dx.doi.org/10.3389/fcell.2020.621144 |
_version_ | 1783639196325380096 |
---|---|
author | Liu, Ting Chen, Jia-Mao Zhang, Dan Zhang, Qian Peng, Bowen Xu, Lei Tang, Hua |
author_facet | Liu, Ting Chen, Jia-Mao Zhang, Dan Zhang, Qian Peng, Bowen Xu, Lei Tang, Hua |
author_sort | Liu, Ting |
collection | PubMed |
description | Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer’s disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to accurately identify and classify apolipoproteins. Although it is possible to identify apolipoproteins accurately through biochemical experiments, they are expensive and time-consuming. This work aims to establish a high-efficiency and high-accuracy prediction model for recognition of apolipoproteins and their subfamilies. We firstly constructed a high-quality benchmark dataset including 270 apolipoproteins and 535 non-apolipoproteins. Based on the dataset, pseudo-amino acid composition (PseAAC) and composition of k-spaced amino acid pairs (CKSAAP) were used as input vectors. To improve the prediction accuracy and eliminate redundant information, analysis of variance (ANOVA) was used to rank the features. And the incremental feature selection was utilized to obtain the best feature subset. Support vector machine (SVM) was proposed to construct the classification model, which could produce the accuracy of 97.27%, sensitivity of 96.30%, and specificity of 97.76% for discriminating apolipoprotein from non-apolipoprotein in 10-fold cross-validation. In addition, the same process was repeated to generate a new model for predicting apolipoprotein subfamilies. The new model could achieve an overall accuracy of 95.93% in 10-fold cross-validation. According to our proposed model, a convenient webserver called ApoPred was established, which can be freely accessed at http://tang-biolab.com/server/ApoPred/service.html. We expect that this work will contribute to apolipoprotein function research and drug development in relevant diseases. |
format | Online Article Text |
id | pubmed-7820372 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78203722021-01-23 ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features Liu, Ting Chen, Jia-Mao Zhang, Dan Zhang, Qian Peng, Bowen Xu, Lei Tang, Hua Front Cell Dev Biol Cell and Developmental Biology Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer’s disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to accurately identify and classify apolipoproteins. Although it is possible to identify apolipoproteins accurately through biochemical experiments, they are expensive and time-consuming. This work aims to establish a high-efficiency and high-accuracy prediction model for recognition of apolipoproteins and their subfamilies. We firstly constructed a high-quality benchmark dataset including 270 apolipoproteins and 535 non-apolipoproteins. Based on the dataset, pseudo-amino acid composition (PseAAC) and composition of k-spaced amino acid pairs (CKSAAP) were used as input vectors. To improve the prediction accuracy and eliminate redundant information, analysis of variance (ANOVA) was used to rank the features. And the incremental feature selection was utilized to obtain the best feature subset. Support vector machine (SVM) was proposed to construct the classification model, which could produce the accuracy of 97.27%, sensitivity of 96.30%, and specificity of 97.76% for discriminating apolipoprotein from non-apolipoprotein in 10-fold cross-validation. In addition, the same process was repeated to generate a new model for predicting apolipoprotein subfamilies. The new model could achieve an overall accuracy of 95.93% in 10-fold cross-validation. According to our proposed model, a convenient webserver called ApoPred was established, which can be freely accessed at http://tang-biolab.com/server/ApoPred/service.html. We expect that this work will contribute to apolipoprotein function research and drug development in relevant diseases. Frontiers Media S.A. 2021-01-08 /pmc/articles/PMC7820372/ /pubmed/33490085 http://dx.doi.org/10.3389/fcell.2020.621144 Text en Copyright © 2021 Liu, Chen, Zhang, Zhang, Peng, Xu and Tang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Cell and Developmental Biology Liu, Ting Chen, Jia-Mao Zhang, Dan Zhang, Qian Peng, Bowen Xu, Lei Tang, Hua ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features |
title | ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features |
title_full | ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features |
title_fullStr | ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features |
title_full_unstemmed | ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features |
title_short | ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features |
title_sort | apopred: identification of apolipoproteins and their subfamilies with multifarious features |
topic | Cell and Developmental Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7820372/ https://www.ncbi.nlm.nih.gov/pubmed/33490085 http://dx.doi.org/10.3389/fcell.2020.621144 |
work_keys_str_mv | AT liuting apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT chenjiamao apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT zhangdan apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT zhangqian apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT pengbowen apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT xulei apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures AT tanghua apopredidentificationofapolipoproteinsandtheirsubfamilieswithmultifariousfeatures |