Cargando…

Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods

DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Qu, Kaiyang, Han, Ke, Wu, Song, Wang, Guohua, Wei, Leyi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2017
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151557/ https://www.ncbi.nlm.nih.gov/pubmed/28937647 http://dx.doi.org/10.3390/molecules22101602

_version_	1783357177857048576
author	Qu, Kaiyang Han, Ke Wu, Song Wang, Guohua Wei, Leyi
author_facet	Qu, Kaiyang Han, Ke Wu, Song Wang, Guohua Wei, Leyi
author_sort	Qu, Kaiyang
collection	PubMed
description	DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.
format	Online Article Text
id	pubmed-6151557
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-61515572018-11-13 Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods Qu, Kaiyang Han, Ke Wu, Song Wang, Guohua Wei, Leyi Molecules Article DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features. MDPI 2017-09-22 /pmc/articles/PMC6151557/ /pubmed/28937647 http://dx.doi.org/10.3390/molecules22101602 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Qu, Kaiyang Han, Ke Wu, Song Wang, Guohua Wei, Leyi Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
title	Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
title_full	Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
title_fullStr	Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
title_full_unstemmed	Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
title_short	Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
title_sort	identification of dna-binding proteins using mixed feature representation methods
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151557/ https://www.ncbi.nlm.nih.gov/pubmed/28937647 http://dx.doi.org/10.3390/molecules22101602
work_keys_str_mv	AT qukaiyang identificationofdnabindingproteinsusingmixedfeaturerepresentationmethods AT hanke identificationofdnabindingproteinsusingmixedfeaturerepresentationmethods AT wusong identificationofdnabindingproteinsusingmixedfeaturerepresentationmethods AT wangguohua identificationofdnabindingproteinsusingmixedfeaturerepresentationmethods AT weileyi identificationofdnabindingproteinsusingmixedfeaturerepresentationmethods

Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods

Ejemplares similares