Cargando…
Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection
Protein remote homology detection is one of the most important problems in bioinformatics. Discriminative methods such as support vector machines (SVM) have shown superior performance. However, the performance of SVM-based methods depends on the vector representations of the protein sequences. Prior...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3460876/ https://www.ncbi.nlm.nih.gov/pubmed/23029559 http://dx.doi.org/10.1371/journal.pone.0046633 |
_version_ | 1782245003179851776 |
---|---|
author | Liu, Bin Wang, Xiaolong Chen, Qingcai Dong, Qiwen Lan, Xun |
author_facet | Liu, Bin Wang, Xiaolong Chen, Qingcai Dong, Qiwen Lan, Xun |
author_sort | Liu, Bin |
collection | PubMed |
description | Protein remote homology detection is one of the most important problems in bioinformatics. Discriminative methods such as support vector machines (SVM) have shown superior performance. However, the performance of SVM-based methods depends on the vector representations of the protein sequences. Prior works have demonstrated that sequence-order effects are relevant for discrimination, but little work has explored how to incorporate the sequence-order information along with the amino acid physicochemical properties into the prediction. In order to incorporate the sequence-order effects into the protein remote homology detection, the physicochemical distance transformation (PDT) method is proposed. Each protein sequence is converted into a series of numbers by using the physicochemical property scores in the amino acid index (AAIndex), and then the sequence is converted into a fixed length vector by PDT. The sequence-order information can be efficiently included into the feature vector with little computational cost by this approach. Finally, the feature vectors are input into a support vector machine classifier to detect the protein remote homologies. Our experiments on a well-known benchmark show the proposed method SVM-PDT achieves superior or comparable performance with current state-of-the-art methods and its computational cost is considerably superior to those of other methods. When the evolutionary information extracted from the frequency profiles is combined with the PDT method, the profile-based PDT approach can improve the performance by 3.4% and 11.4% in terms of ROC score and ROC50 score respectively. The local sequence-order information of the protein can be efficiently captured by the proposed PDT and the physicochemical properties extracted from the amino acid index are incorporated into the prediction. The physicochemical distance transformation provides a general framework, which would be a valuable tool for protein-level study. |
format | Online Article Text |
id | pubmed-3460876 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-34608762012-10-01 Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection Liu, Bin Wang, Xiaolong Chen, Qingcai Dong, Qiwen Lan, Xun PLoS One Research Article Protein remote homology detection is one of the most important problems in bioinformatics. Discriminative methods such as support vector machines (SVM) have shown superior performance. However, the performance of SVM-based methods depends on the vector representations of the protein sequences. Prior works have demonstrated that sequence-order effects are relevant for discrimination, but little work has explored how to incorporate the sequence-order information along with the amino acid physicochemical properties into the prediction. In order to incorporate the sequence-order effects into the protein remote homology detection, the physicochemical distance transformation (PDT) method is proposed. Each protein sequence is converted into a series of numbers by using the physicochemical property scores in the amino acid index (AAIndex), and then the sequence is converted into a fixed length vector by PDT. The sequence-order information can be efficiently included into the feature vector with little computational cost by this approach. Finally, the feature vectors are input into a support vector machine classifier to detect the protein remote homologies. Our experiments on a well-known benchmark show the proposed method SVM-PDT achieves superior or comparable performance with current state-of-the-art methods and its computational cost is considerably superior to those of other methods. When the evolutionary information extracted from the frequency profiles is combined with the PDT method, the profile-based PDT approach can improve the performance by 3.4% and 11.4% in terms of ROC score and ROC50 score respectively. The local sequence-order information of the protein can be efficiently captured by the proposed PDT and the physicochemical properties extracted from the amino acid index are incorporated into the prediction. The physicochemical distance transformation provides a general framework, which would be a valuable tool for protein-level study. Public Library of Science 2012-09-28 /pmc/articles/PMC3460876/ /pubmed/23029559 http://dx.doi.org/10.1371/journal.pone.0046633 Text en © 2012 Liu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Liu, Bin Wang, Xiaolong Chen, Qingcai Dong, Qiwen Lan, Xun Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection |
title | Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection |
title_full | Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection |
title_fullStr | Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection |
title_full_unstemmed | Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection |
title_short | Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection |
title_sort | using amino acid physicochemical distance transformation for fast protein remote homology detection |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3460876/ https://www.ncbi.nlm.nih.gov/pubmed/23029559 http://dx.doi.org/10.1371/journal.pone.0046633 |
work_keys_str_mv | AT liubin usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection AT wangxiaolong usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection AT chenqingcai usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection AT dongqiwen usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection AT lanxun usingaminoacidphysicochemicaldistancetransformationforfastproteinremotehomologydetection |