Cargando…

A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD

With the development of computer technology, many machine learning algorithms have been applied to the field of biology, forming the discipline of bioinformatics. Protein function prediction is a classic research topic in this subject area. Though many scholars have made achievements in identifying...

Descripción completa

Detalles Bibliográficos
Autores principales: Tao, Zhiyu, Li, Yanjuan, Teng, Zhixia, Zhao, Yuming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7591939/
https://www.ncbi.nlm.nih.gov/pubmed/33133228
http://dx.doi.org/10.1155/2020/8926750
_version_ 1783601092700930048
author Tao, Zhiyu
Li, Yanjuan
Teng, Zhixia
Zhao, Yuming
author_facet Tao, Zhiyu
Li, Yanjuan
Teng, Zhixia
Zhao, Yuming
author_sort Tao, Zhiyu
collection PubMed
description With the development of computer technology, many machine learning algorithms have been applied to the field of biology, forming the discipline of bioinformatics. Protein function prediction is a classic research topic in this subject area. Though many scholars have made achievements in identifying protein by different algorithms, they often extract a large number of feature types and use very complex classification methods to obtain little improvement in the classification effect, and this process is very time-consuming. In this research, we attempt to utilize as few features as possible to classify vesicular transportation proteins and to simultaneously obtain a comparative satisfactory classification result. We adopt CTDC which is a submethod of the method of composition, transition, and distribution (CTD) to extract only 39 features from each sequence, and LibSVM is used as the classification method. We use the SMOTE method to deal with the problem of dataset imbalance. There are 11619 protein sequences in our dataset. We selected 4428 sequences to train our classification model and selected other 1832 sequences from our dataset to test the classification effect and finally achieved an accuracy of 71.77%. After dimension reduction by MRMD, the accuracy is 72.16%.
format Online
Article
Text
id pubmed-7591939
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-75919392020-10-30 A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD Tao, Zhiyu Li, Yanjuan Teng, Zhixia Zhao, Yuming Comput Math Methods Med Research Article With the development of computer technology, many machine learning algorithms have been applied to the field of biology, forming the discipline of bioinformatics. Protein function prediction is a classic research topic in this subject area. Though many scholars have made achievements in identifying protein by different algorithms, they often extract a large number of feature types and use very complex classification methods to obtain little improvement in the classification effect, and this process is very time-consuming. In this research, we attempt to utilize as few features as possible to classify vesicular transportation proteins and to simultaneously obtain a comparative satisfactory classification result. We adopt CTDC which is a submethod of the method of composition, transition, and distribution (CTD) to extract only 39 features from each sequence, and LibSVM is used as the classification method. We use the SMOTE method to deal with the problem of dataset imbalance. There are 11619 protein sequences in our dataset. We selected 4428 sequences to train our classification model and selected other 1832 sequences from our dataset to test the classification effect and finally achieved an accuracy of 71.77%. After dimension reduction by MRMD, the accuracy is 72.16%. Hindawi 2020-10-19 /pmc/articles/PMC7591939/ /pubmed/33133228 http://dx.doi.org/10.1155/2020/8926750 Text en Copyright © 2020 Zhiyu Tao et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Tao, Zhiyu
Li, Yanjuan
Teng, Zhixia
Zhao, Yuming
A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD
title A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD
title_full A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD
title_fullStr A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD
title_full_unstemmed A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD
title_short A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD
title_sort method for identifying vesicle transport proteins based on libsvm and mrmd
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7591939/
https://www.ncbi.nlm.nih.gov/pubmed/33133228
http://dx.doi.org/10.1155/2020/8926750
work_keys_str_mv AT taozhiyu amethodforidentifyingvesicletransportproteinsbasedonlibsvmandmrmd
AT liyanjuan amethodforidentifyingvesicletransportproteinsbasedonlibsvmandmrmd
AT tengzhixia amethodforidentifyingvesicletransportproteinsbasedonlibsvmandmrmd
AT zhaoyuming amethodforidentifyingvesicletransportproteinsbasedonlibsvmandmrmd
AT taozhiyu methodforidentifyingvesicletransportproteinsbasedonlibsvmandmrmd
AT liyanjuan methodforidentifyingvesicletransportproteinsbasedonlibsvmandmrmd
AT tengzhixia methodforidentifyingvesicletransportproteinsbasedonlibsvmandmrmd
AT zhaoyuming methodforidentifyingvesicletransportproteinsbasedonlibsvmandmrmd