Cargando…

The combination approach of SVM and ECOC for powerful identification and classification of transcription factor

BACKGROUND: Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Guangyong, Qian, Ziliang, Yang, Qing, Wei, Chaochun, Xie, Lu, Zhu, Yangyong, Li, Yixue
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440765/
https://www.ncbi.nlm.nih.gov/pubmed/18554421
http://dx.doi.org/10.1186/1471-2105-9-282
_version_ 1782156576402964480
author Zheng, Guangyong
Qian, Ziliang
Yang, Qing
Wei, Chaochun
Xie, Lu
Zhu, Yangyong
Li, Yixue
author_facet Zheng, Guangyong
Qian, Ziliang
Yang, Qing
Wei, Chaochun
Xie, Lu
Zhu, Yangyong
Li, Yixue
author_sort Zheng, Guangyong
collection PubMed
description BACKGROUND: Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand. RESULTS: The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL). CONCLUSION: The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.
format Text
id pubmed-2440765
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24407652008-06-27 The combination approach of SVM and ECOC for powerful identification and classification of transcription factor Zheng, Guangyong Qian, Ziliang Yang, Qing Wei, Chaochun Xie, Lu Zhu, Yangyong Li, Yixue BMC Bioinformatics Research Article BACKGROUND: Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand. RESULTS: The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL). CONCLUSION: The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining. BioMed Central 2008-06-16 /pmc/articles/PMC2440765/ /pubmed/18554421 http://dx.doi.org/10.1186/1471-2105-9-282 Text en Copyright © 2008 Zheng et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zheng, Guangyong
Qian, Ziliang
Yang, Qing
Wei, Chaochun
Xie, Lu
Zhu, Yangyong
Li, Yixue
The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
title The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
title_full The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
title_fullStr The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
title_full_unstemmed The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
title_short The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
title_sort combination approach of svm and ecoc for powerful identification and classification of transcription factor
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440765/
https://www.ncbi.nlm.nih.gov/pubmed/18554421
http://dx.doi.org/10.1186/1471-2105-9-282
work_keys_str_mv AT zhengguangyong thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT qianziliang thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT yangqing thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT weichaochun thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT xielu thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT zhuyangyong thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT liyixue thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT zhengguangyong combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT qianziliang combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT yangqing combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT weichaochun combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT xielu combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT zhuyangyong combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor
AT liyixue combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor