Cargando…

Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection

Protein remote homology detection is one of the most important problems in bioinformatics. Discriminative methods such as support vector machines (SVM) have shown superior performance. However, the performance of SVM-based methods depends on the vector representations of the protein sequences. Prior...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Bin, Wang, Xiaolong, Chen, Qingcai, Dong, Qiwen, Lan, Xun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3460876/
https://www.ncbi.nlm.nih.gov/pubmed/23029559
http://dx.doi.org/10.1371/journal.pone.0046633
_version_ 1782245003179851776
author Liu, Bin
Wang, Xiaolong
Chen, Qingcai
Dong, Qiwen
Lan, Xun
author_facet Liu, Bin
Wang, Xiaolong
Chen, Qingcai
Dong, Qiwen
Lan, Xun
author_sort Liu, Bin
collection PubMed
description Protein remote homology detection is one of the most important problems in bioinformatics. Discriminative methods such as support vector machines (SVM) have shown superior performance. However, the performance of SVM-based methods depends on the vector representations of the protein sequences. Prior works have demonstrated that sequence-order effects are relevant for discrimination, but little work has explored how to incorporate the sequence-order information along with the amino acid physicochemical properties into the prediction. In order to incorporate the sequence-order effects into the protein remote homology detection, the physicochemical distance transformation (PDT) method is proposed. Each protein sequence is converted into a series of numbers by using the physicochemical property scores in the amino acid index (AAIndex), and then the sequence is converted into a fixed length vector by PDT. The sequence-order information can be efficiently included into the feature vector with little computational cost by this approach. Finally, the feature vectors are input into a support vector machine classifier to detect the protein remote homologies. Our experiments on a well-known benchmark show the proposed method SVM-PDT achieves superior or comparable performance with current state-of-the-art methods and its computational cost is considerably superior to those of other methods. When the evolutionary information extracted from the frequency profiles is combined with the PDT method, the profile-based PDT approach can improve the performance by 3.4% and 11.4% in terms of ROC score and ROC50 score respectively. The local sequence-order information of the protein can be efficiently captured by the proposed PDT and the physicochemical properties extracted from the amino acid index are incorporated into the prediction. The physicochemical distance transformation provides a general framework, which would be a valuable tool for protein-level study.
format Online
Article
Text
id pubmed-3460876
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34608762012-10-01 Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection Liu, Bin Wang, Xiaolong Chen, Qingcai Dong, Qiwen Lan, Xun PLoS One Research Article Protein remote homology detection is one of the most important problems in bioinformatics. Discriminative methods such as support vector machines (SVM) have shown superior performance. However, the performance of SVM-based methods depends on the vector representations of the protein sequences. Prior works have demonstrated that sequence-order effects are relevant for discrimination, but little work has explored how to incorporate the sequence-order information along with the amino acid physicochemical properties into the prediction. In order to incorporate the sequence-order effects into the protein remote homology detection, the physicochemical distance transformation (PDT) method is proposed. Each protein sequence is converted into a series of numbers by using the physicochemical property scores in the amino acid index (AAIndex), and then the sequence is converted into a fixed length vector by PDT. The sequence-order information can be efficiently included into the feature vector with little computational cost by this approach. Finally, the feature vectors are input into a support vector machine classifier to detect the protein remote homologies. Our experiments on a well-known benchmark show the proposed method SVM-PDT achieves superior or comparable performance with current state-of-the-art methods and its computational cost is considerably superior to those of other methods. When the evolutionary information extracted from the frequency profiles is combined with the PDT method, the profile-based PDT approach can improve the performance by 3.4% and 11.4% in terms of ROC score and ROC50 score respectively. The local sequence-order information of the protein can be efficiently captured by the proposed PDT and the physicochemical properties extracted from the amino acid index are incorporated into the prediction. The physicochemical distance transformation provides a general framework, which would be a valuable tool for protein-level study. Public Library of Science 2012-09-28 /pmc/articles/PMC3460876/ /pubmed/23029559 http://dx.doi.org/10.1371/journal.pone.0046633 Text en © 2012 Liu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Liu, Bin
Wang, Xiaolong
Chen, Qingcai
Dong, Qiwen
Lan, Xun
Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection
title Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection
title_full Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection
title_fullStr Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection
title_full_unstemmed Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection
title_short Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection
title_sort using amino acid physicochemical distance transformation for fast protein remote homology detection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3460876/
https://www.ncbi.nlm.nih.gov/pubmed/23029559
http://dx.doi.org/10.1371/journal.pone.0046633
work_keys_str_mv AT liubin usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection
AT wangxiaolong usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection
AT chenqingcai usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection
AT dongqiwen usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection
AT lanxun usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection