Cargando…

DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information

BACKGROUND: Protein feature extraction plays an important role in the areas of similarity analysis of protein sequences and prediction of protein structures, functions and interactions. The feature extraction based on graphical representation is one of the most effective and efficient ways. However,...

Descripción completa

Detalles Bibliográficos
Autores principales: Mu, Zengchao, Yu, Ting, Qi, Enfeng, Liu, Juntao, Li, Guojun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6587251/
https://www.ncbi.nlm.nih.gov/pubmed/31221087
http://dx.doi.org/10.1186/s12859-019-2943-x
_version_ 1783429029327536128
author Mu, Zengchao
Yu, Ting
Qi, Enfeng
Liu, Juntao
Li, Guojun
author_facet Mu, Zengchao
Yu, Ting
Qi, Enfeng
Liu, Juntao
Li, Guojun
author_sort Mu, Zengchao
collection PubMed
description BACKGROUND: Protein feature extraction plays an important role in the areas of similarity analysis of protein sequences and prediction of protein structures, functions and interactions. The feature extraction based on graphical representation is one of the most effective and efficient ways. However, most existing methods suffer limitations from their method design. RESULTS: We introduce DCGR, a novel method for extracting features from protein sequences based on the chaos game representation, which is developed by constructing CGR curves of protein sequences according to physicochemical properties of amino acids, followed by converting the CGR curves into multi-dimensional feature vectors by using the distributions of points in CGR images. Tested on five data sets, DCGR was significantly superior to the state-of-the-art feature extraction methods. CONCLUSION: The DCGR is practically powerful for extracting effective features from protein sequences, and therefore important in similarity analysis of protein sequences, study of protein-protein interactions and prediction of protein functions. It is freely available at https://sourceforge.net/projects/transcriptomeassembly/files/Feature%20Extraction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2943-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6587251
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65872512019-06-27 DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information Mu, Zengchao Yu, Ting Qi, Enfeng Liu, Juntao Li, Guojun BMC Bioinformatics Methodology Article BACKGROUND: Protein feature extraction plays an important role in the areas of similarity analysis of protein sequences and prediction of protein structures, functions and interactions. The feature extraction based on graphical representation is one of the most effective and efficient ways. However, most existing methods suffer limitations from their method design. RESULTS: We introduce DCGR, a novel method for extracting features from protein sequences based on the chaos game representation, which is developed by constructing CGR curves of protein sequences according to physicochemical properties of amino acids, followed by converting the CGR curves into multi-dimensional feature vectors by using the distributions of points in CGR images. Tested on five data sets, DCGR was significantly superior to the state-of-the-art feature extraction methods. CONCLUSION: The DCGR is practically powerful for extracting effective features from protein sequences, and therefore important in similarity analysis of protein sequences, study of protein-protein interactions and prediction of protein functions. It is freely available at https://sourceforge.net/projects/transcriptomeassembly/files/Feature%20Extraction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2943-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-20 /pmc/articles/PMC6587251/ /pubmed/31221087 http://dx.doi.org/10.1186/s12859-019-2943-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Mu, Zengchao
Yu, Ting
Qi, Enfeng
Liu, Juntao
Li, Guojun
DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information
title DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information
title_full DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information
title_fullStr DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information
title_full_unstemmed DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information
title_short DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information
title_sort dcgr: feature extractions from protein sequences based on cgr via remodeling multiple information
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6587251/
https://www.ncbi.nlm.nih.gov/pubmed/31221087
http://dx.doi.org/10.1186/s12859-019-2943-x
work_keys_str_mv AT muzengchao dcgrfeatureextractionsfromproteinsequencesbasedoncgrviaremodelingmultipleinformation
AT yuting dcgrfeatureextractionsfromproteinsequencesbasedoncgrviaremodelingmultipleinformation
AT qienfeng dcgrfeatureextractionsfromproteinsequencesbasedoncgrviaremodelingmultipleinformation
AT liujuntao dcgrfeatureextractionsfromproteinsequencesbasedoncgrviaremodelingmultipleinformation
AT liguojun dcgrfeatureextractionsfromproteinsequencesbasedoncgrviaremodelingmultipleinformation