Cargando…
Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation
BACKGROUND: DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331676/ https://www.ncbi.nlm.nih.gov/pubmed/25708928 http://dx.doi.org/10.1186/1752-0509-9-S1-S10 |
_version_ | 1782357756268773376 |
---|---|
author | Xu, Ruifeng Zhou, Jiyun Wang, Hongpeng He, Yulan Wang, Xiaolong Liu, Bin |
author_facet | Xu, Ruifeng Zhou, Jiyun Wang, Hongpeng He, Yulan Wang, Xiaolong Liu, Bin |
author_sort | Xu, Ruifeng |
collection | PubMed |
description | BACKGROUND: DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. RESULTS: We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. CONCLUSIONS: The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/. |
format | Online Article Text |
id | pubmed-4331676 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43316762015-03-25 Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation Xu, Ruifeng Zhou, Jiyun Wang, Hongpeng He, Yulan Wang, Xiaolong Liu, Bin BMC Syst Biol Proceedings BACKGROUND: DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation. There have been several computational methods proposed in the literature to deal with the DNA-binding protein identification. However, most of them can't provide an invaluable knowledge base for our understanding of DNA-protein interactions. RESULTS: We firstly presented a new protein sequence encoding method called PSSM Distance Transformation, and then constructed a DNA-binding protein identification method (SVM-PSSM-DT) by combining PSSM Distance Transformation with support vector machine (SVM). First, the PSSM profiles are generated by using the PSI-BLAST program to search the non-redundant (NR) database. Next, the PSSM profiles are transformed into uniform numeric representations appropriately by distance transformation scheme. Lastly, the resulting uniform numeric representations are inputted into a SVM classifier for prediction. Thus whether a sequence can bind to DNA or not can be determined. In benchmark test on 525 DNA-binding and 550 non DNA-binding proteins using jackknife validation, the present model achieved an ACC of 79.96%, MCC of 0.622 and AUC of 86.50%. This performance is considerably better than most of the existing state-of-the-art predictive methods. When tested on a recently constructed independent dataset PDB186, SVM-PSSM-DT also achieved the best performance with ACC of 80.00%, MCC of 0.647 and AUC of 87.40%, and outperformed some existing state-of-the-art methods. CONCLUSIONS: The experiment results demonstrate that PSSM Distance Transformation is an available protein sequence encoding method and SVM-PSSM-DT is a useful tool for identifying the DNA-binding proteins. A user-friendly web-server of SVM-PSSM-DT was constructed, which is freely accessible to the public at the web-site on http://bioinformatics.hitsz.edu.cn/PSSM-DT/. BioMed Central 2015-02-06 /pmc/articles/PMC4331676/ /pubmed/25708928 http://dx.doi.org/10.1186/1752-0509-9-S1-S10 Text en Copyright © 2015 Xu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Xu, Ruifeng Zhou, Jiyun Wang, Hongpeng He, Yulan Wang, Xiaolong Liu, Bin Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation |
title | Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation |
title_full | Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation |
title_fullStr | Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation |
title_full_unstemmed | Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation |
title_short | Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation |
title_sort | identifying dna-binding proteins by combining support vector machine and pssm distance transformation |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331676/ https://www.ncbi.nlm.nih.gov/pubmed/25708928 http://dx.doi.org/10.1186/1752-0509-9-S1-S10 |
work_keys_str_mv | AT xuruifeng identifyingdnabindingproteinsbycombiningsupportvectormachineandpssmdistancetransformation AT zhoujiyun identifyingdnabindingproteinsbycombiningsupportvectormachineandpssmdistancetransformation AT wanghongpeng identifyingdnabindingproteinsbycombiningsupportvectormachineandpssmdistancetransformation AT heyulan identifyingdnabindingproteinsbycombiningsupportvectormachineandpssmdistancetransformation AT wangxiaolong identifyingdnabindingproteinsbycombiningsupportvectormachineandpssmdistancetransformation AT liubin identifyingdnabindingproteinsbycombiningsupportvectormachineandpssmdistancetransformation |