Cargando…

Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA

An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein se...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Shunfang, Liu, Shuhui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4691178/
https://www.ncbi.nlm.nih.gov/pubmed/26703574
http://dx.doi.org/10.3390/ijms161226237
_version_ 1782407118009139200
author Wang, Shunfang
Liu, Shuhui
author_facet Wang, Shunfang
Liu, Shuhui
author_sort Wang, Shunfang
collection PubMed
description An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.
format Online
Article
Text
id pubmed-4691178
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-46911782016-01-06 Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA Wang, Shunfang Liu, Shuhui Int J Mol Sci Article An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one. MDPI 2015-12-19 /pmc/articles/PMC4691178/ /pubmed/26703574 http://dx.doi.org/10.3390/ijms161226237 Text en © 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Shunfang
Liu, Shuhui
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA
title Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA
title_full Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA
title_fullStr Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA
title_full_unstemmed Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA
title_short Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA
title_sort protein sub-nuclear localization based on effective fusion representations and dimension reduction algorithm lda
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4691178/
https://www.ncbi.nlm.nih.gov/pubmed/26703574
http://dx.doi.org/10.3390/ijms161226237
work_keys_str_mv AT wangshunfang proteinsubnuclearlocalizationbasedoneffectivefusionrepresentationsanddimensionreductionalgorithmlda
AT liushuhui proteinsubnuclearlocalizationbasedoneffectivefusionrepresentationsanddimensionreductionalgorithmlda