Cargando…
The combination approach of SVM and ECOC for powerful identification and classification of transcription factor
BACKGROUND: Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440765/ https://www.ncbi.nlm.nih.gov/pubmed/18554421 http://dx.doi.org/10.1186/1471-2105-9-282 |
_version_ | 1782156576402964480 |
---|---|
author | Zheng, Guangyong Qian, Ziliang Yang, Qing Wei, Chaochun Xie, Lu Zhu, Yangyong Li, Yixue |
author_facet | Zheng, Guangyong Qian, Ziliang Yang, Qing Wei, Chaochun Xie, Lu Zhu, Yangyong Li, Yixue |
author_sort | Zheng, Guangyong |
collection | PubMed |
description | BACKGROUND: Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand. RESULTS: The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL). CONCLUSION: The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining. |
format | Text |
id | pubmed-2440765 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-24407652008-06-27 The combination approach of SVM and ECOC for powerful identification and classification of transcription factor Zheng, Guangyong Qian, Ziliang Yang, Qing Wei, Chaochun Xie, Lu Zhu, Yangyong Li, Yixue BMC Bioinformatics Research Article BACKGROUND: Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand. RESULTS: The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL). CONCLUSION: The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining. BioMed Central 2008-06-16 /pmc/articles/PMC2440765/ /pubmed/18554421 http://dx.doi.org/10.1186/1471-2105-9-282 Text en Copyright © 2008 Zheng et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Zheng, Guangyong Qian, Ziliang Yang, Qing Wei, Chaochun Xie, Lu Zhu, Yangyong Li, Yixue The combination approach of SVM and ECOC for powerful identification and classification of transcription factor |
title | The combination approach of SVM and ECOC for powerful identification and classification of transcription factor |
title_full | The combination approach of SVM and ECOC for powerful identification and classification of transcription factor |
title_fullStr | The combination approach of SVM and ECOC for powerful identification and classification of transcription factor |
title_full_unstemmed | The combination approach of SVM and ECOC for powerful identification and classification of transcription factor |
title_short | The combination approach of SVM and ECOC for powerful identification and classification of transcription factor |
title_sort | combination approach of svm and ecoc for powerful identification and classification of transcription factor |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440765/ https://www.ncbi.nlm.nih.gov/pubmed/18554421 http://dx.doi.org/10.1186/1471-2105-9-282 |
work_keys_str_mv | AT zhengguangyong thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT qianziliang thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT yangqing thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT weichaochun thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT xielu thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT zhuyangyong thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT liyixue thecombinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT zhengguangyong combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT qianziliang combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT yangqing combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT weichaochun combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT xielu combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT zhuyangyong combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor AT liyixue combinationapproachofsvmandecocforpowerfulidentificationandclassificationoftranscriptionfactor |