Cargando…

PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation

DNA-binding proteins play crucial roles in various biological processes, such as DNA replication and repair, transcriptional regulation and many other biological activities associated with DNA. Experimental recognition techniques for DNA-binding proteins identification are both time consuming and ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jun, Liu, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5618505/
https://www.ncbi.nlm.nih.gov/pubmed/28841194
http://dx.doi.org/10.3390/ijms18091856
_version_ 1783267201760886784
author Zhang, Jun
Liu, Bin
author_facet Zhang, Jun
Liu, Bin
author_sort Zhang, Jun
collection PubMed
description DNA-binding proteins play crucial roles in various biological processes, such as DNA replication and repair, transcriptional regulation and many other biological activities associated with DNA. Experimental recognition techniques for DNA-binding proteins identification are both time consuming and expensive. Effective methods for identifying these proteins only based on protein sequences are highly required. The key for sequence-based methods is to effectively represent protein sequences. It has been reported by various previous studies that evolutionary information is crucial for DNA-binding protein identification. In this study, we employed four methods to extract the evolutionary information from Position Specific Frequency Matrix (PSFM), including Residue Probing Transformation (RPT), Evolutionary Difference Transformation (EDT), Distance-Bigram Transformation (DBT), and Trigram Transformation (TT). The PSFMs were converted into fixed length feature vectors by these four methods, and then respectively combined with Support Vector Machines (SVMs); four predictors for identifying these proteins were constructed, including PSFM-RPT, PSFM-EDT, PSFM-DBT, and PSFM-TT. Experimental results on a widely used benchmark dataset PDB1075 and an independent dataset PDB186 showed that these four methods achieved state-of-the-art-performance, and PSFM-DBT outperformed other existing methods in this field. For practical applications, a user-friendly webserver of PSFM-DBT was established, which is available at http://bioinformatics.hitsz.edu.cn/PSFM-DBT/.
format Online
Article
Text
id pubmed-5618505
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-56185052017-09-30 PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation Zhang, Jun Liu, Bin Int J Mol Sci Article DNA-binding proteins play crucial roles in various biological processes, such as DNA replication and repair, transcriptional regulation and many other biological activities associated with DNA. Experimental recognition techniques for DNA-binding proteins identification are both time consuming and expensive. Effective methods for identifying these proteins only based on protein sequences are highly required. The key for sequence-based methods is to effectively represent protein sequences. It has been reported by various previous studies that evolutionary information is crucial for DNA-binding protein identification. In this study, we employed four methods to extract the evolutionary information from Position Specific Frequency Matrix (PSFM), including Residue Probing Transformation (RPT), Evolutionary Difference Transformation (EDT), Distance-Bigram Transformation (DBT), and Trigram Transformation (TT). The PSFMs were converted into fixed length feature vectors by these four methods, and then respectively combined with Support Vector Machines (SVMs); four predictors for identifying these proteins were constructed, including PSFM-RPT, PSFM-EDT, PSFM-DBT, and PSFM-TT. Experimental results on a widely used benchmark dataset PDB1075 and an independent dataset PDB186 showed that these four methods achieved state-of-the-art-performance, and PSFM-DBT outperformed other existing methods in this field. For practical applications, a user-friendly webserver of PSFM-DBT was established, which is available at http://bioinformatics.hitsz.edu.cn/PSFM-DBT/. MDPI 2017-08-25 /pmc/articles/PMC5618505/ /pubmed/28841194 http://dx.doi.org/10.3390/ijms18091856 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Jun
Liu, Bin
PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_full PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_fullStr PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_full_unstemmed PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_short PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation
title_sort psfm-dbt: identifying dna-binding proteins by combing position specific frequency matrix and distance-bigram transformation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5618505/
https://www.ncbi.nlm.nih.gov/pubmed/28841194
http://dx.doi.org/10.3390/ijms18091856
work_keys_str_mv AT zhangjun psfmdbtidentifyingdnabindingproteinsbycombingpositionspecificfrequencymatrixanddistancebigramtransformation
AT liubin psfmdbtidentifyingdnabindingproteinsbycombingpositionspecificfrequencymatrixanddistancebigramtransformation