Cargando…
Accurate prediction of nuclear receptors with conjoint triad feature
BACKGROUND: Nuclear receptors (NRs) form a large family of ligand-inducible transcription factors that regulate gene expressions involved in numerous physiological phenomena, such as embryogenesis, homeostasis, cell growth and death. These nuclear receptors-related pathways are important targets of...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4668603/ https://www.ncbi.nlm.nih.gov/pubmed/26630876 http://dx.doi.org/10.1186/s12859-015-0828-1 |
_version_ | 1782403993299845120 |
---|---|
author | Wang, Hongchu Hu, Xuehai |
author_facet | Wang, Hongchu Hu, Xuehai |
author_sort | Wang, Hongchu |
collection | PubMed |
description | BACKGROUND: Nuclear receptors (NRs) form a large family of ligand-inducible transcription factors that regulate gene expressions involved in numerous physiological phenomena, such as embryogenesis, homeostasis, cell growth and death. These nuclear receptors-related pathways are important targets of marketed drugs. Therefore, the design of a reliable computational model for predicting NRs from amino acid sequence has now been a significant biomedical problem. RESULTS: Conjoint triad feature (CTF) mainly considers neighbor relationships in protein sequences by encoding each protein sequence using the triad (continuous three amino acids) frequency distribution extracted from a 7-letter reduced alphabet. In addition, chaos game representation (CGR) can investigate the patterns hidden in protein sequences and visually reveal previously unknown structure. In this paper, three methods, CTF, CGR, amino acid composition (AAC), are applied to formulate the protein samples. By considering different combinations of three methods, we study seven groups of features, and each group is evaluated by the 10-fold cross-validation test. Meanwhile, a new non-redundant dataset containing 474 NR sequences and 500 non-NR sequences is built based on the latest NucleaRDB database. Comparing the results of numerical experiments, the group of combined features with CTF and AAC gets the best result with the accuracy of 96.30 % for identifying NRs from non-NRs. Moreover, if it is classified as a NR, it will be further put into the second level, which will classify a NR into one of the eight main subfamilies. At the second level, the group of combined features with CTF and AAC also gets the best accuracy of 94.73 %. Subsequently, the proposed predictor is compared with two existing methods, and the comparisons show that the accuracies of two levels significantly increase to 98.79 % (NR-2L: 92.56 %; iNR-PhysChem: 98.18 %; the first level) and 93.71 % (NR-2L: 88.68 %; iNR-PhysChem: 92.45 %; the second level) with the introduction of our CTF-based method. Finally, each component of CTF features is analyzed via the statistical significant test, and a simplified model only with the resulting top-50 significant features achieves accuracy of 95.28 %. CONCLUSIONS: The experimental results demonstrate that our CTF-based method is an effective way for predicting nuclear receptor proteins. Furthermore, the top-50 significant features obtained from the statistical significant test are considered as the “intrinsic features” in predicting NRs based on the analysis of relative importance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0828-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4668603 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46686032015-12-04 Accurate prediction of nuclear receptors with conjoint triad feature Wang, Hongchu Hu, Xuehai BMC Bioinformatics Methodology Article BACKGROUND: Nuclear receptors (NRs) form a large family of ligand-inducible transcription factors that regulate gene expressions involved in numerous physiological phenomena, such as embryogenesis, homeostasis, cell growth and death. These nuclear receptors-related pathways are important targets of marketed drugs. Therefore, the design of a reliable computational model for predicting NRs from amino acid sequence has now been a significant biomedical problem. RESULTS: Conjoint triad feature (CTF) mainly considers neighbor relationships in protein sequences by encoding each protein sequence using the triad (continuous three amino acids) frequency distribution extracted from a 7-letter reduced alphabet. In addition, chaos game representation (CGR) can investigate the patterns hidden in protein sequences and visually reveal previously unknown structure. In this paper, three methods, CTF, CGR, amino acid composition (AAC), are applied to formulate the protein samples. By considering different combinations of three methods, we study seven groups of features, and each group is evaluated by the 10-fold cross-validation test. Meanwhile, a new non-redundant dataset containing 474 NR sequences and 500 non-NR sequences is built based on the latest NucleaRDB database. Comparing the results of numerical experiments, the group of combined features with CTF and AAC gets the best result with the accuracy of 96.30 % for identifying NRs from non-NRs. Moreover, if it is classified as a NR, it will be further put into the second level, which will classify a NR into one of the eight main subfamilies. At the second level, the group of combined features with CTF and AAC also gets the best accuracy of 94.73 %. Subsequently, the proposed predictor is compared with two existing methods, and the comparisons show that the accuracies of two levels significantly increase to 98.79 % (NR-2L: 92.56 %; iNR-PhysChem: 98.18 %; the first level) and 93.71 % (NR-2L: 88.68 %; iNR-PhysChem: 92.45 %; the second level) with the introduction of our CTF-based method. Finally, each component of CTF features is analyzed via the statistical significant test, and a simplified model only with the resulting top-50 significant features achieves accuracy of 95.28 %. CONCLUSIONS: The experimental results demonstrate that our CTF-based method is an effective way for predicting nuclear receptor proteins. Furthermore, the top-50 significant features obtained from the statistical significant test are considered as the “intrinsic features” in predicting NRs based on the analysis of relative importance. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0828-1) contains supplementary material, which is available to authorized users. BioMed Central 2015-12-03 /pmc/articles/PMC4668603/ /pubmed/26630876 http://dx.doi.org/10.1186/s12859-015-0828-1 Text en © Wang and Hu. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Wang, Hongchu Hu, Xuehai Accurate prediction of nuclear receptors with conjoint triad feature |
title | Accurate prediction of nuclear receptors with conjoint triad feature |
title_full | Accurate prediction of nuclear receptors with conjoint triad feature |
title_fullStr | Accurate prediction of nuclear receptors with conjoint triad feature |
title_full_unstemmed | Accurate prediction of nuclear receptors with conjoint triad feature |
title_short | Accurate prediction of nuclear receptors with conjoint triad feature |
title_sort | accurate prediction of nuclear receptors with conjoint triad feature |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4668603/ https://www.ncbi.nlm.nih.gov/pubmed/26630876 http://dx.doi.org/10.1186/s12859-015-0828-1 |
work_keys_str_mv | AT wanghongchu accuratepredictionofnuclearreceptorswithconjointtriadfeature AT huxuehai accuratepredictionofnuclearreceptorswithconjointtriadfeature |