Cargando…

iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach

Membrane protein is an important kind of proteins. It plays essential roles in several cellular processes. Based on the intramolecular arrangements and positions in a cell, membrane proteins can be divided into several types. It is reported that the types of a membrane protein are highly related to...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Wei, Chen, Lei, Dai, Qi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8523280/
https://www.ncbi.nlm.nih.gov/pubmed/34671418
http://dx.doi.org/10.1155/2021/7681497
_version_ 1784585267910279168
author Chen, Wei
Chen, Lei
Dai, Qi
author_facet Chen, Wei
Chen, Lei
Dai, Qi
author_sort Chen, Wei
collection PubMed
description Membrane protein is an important kind of proteins. It plays essential roles in several cellular processes. Based on the intramolecular arrangements and positions in a cell, membrane proteins can be divided into several types. It is reported that the types of a membrane protein are highly related to its functions. Determination of membrane protein types is a hot topic in recent years. A plenty of computational methods have been proposed so far. Some of them used functional domain information to encode proteins. However, this procedure was still crude. In this study, we designed a novel feature extraction scheme to obtain informative features of proteins from their functional domain information. Such scheme termed domains as words and proteins, represented by its domains, as sentences. The natural language processing approach, word2vector, was applied to access the features of domains, which were further refined to protein features. Based on these features, RAndom k-labELsets with random forest as the base classifier was employed to build the multilabel classifier, namely, iMPT-FDNPL. The tenfold cross-validation results indicated the good performance of such classifier. Furthermore, such classifier was superior to other classifiers based on features derived from functional domains via one-hot scheme or derived from other properties of proteins, suggesting the effectiveness of protein features generated by the proposed scheme.
format Online
Article
Text
id pubmed-8523280
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-85232802021-10-19 iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach Chen, Wei Chen, Lei Dai, Qi Comput Math Methods Med Research Article Membrane protein is an important kind of proteins. It plays essential roles in several cellular processes. Based on the intramolecular arrangements and positions in a cell, membrane proteins can be divided into several types. It is reported that the types of a membrane protein are highly related to its functions. Determination of membrane protein types is a hot topic in recent years. A plenty of computational methods have been proposed so far. Some of them used functional domain information to encode proteins. However, this procedure was still crude. In this study, we designed a novel feature extraction scheme to obtain informative features of proteins from their functional domain information. Such scheme termed domains as words and proteins, represented by its domains, as sentences. The natural language processing approach, word2vector, was applied to access the features of domains, which were further refined to protein features. Based on these features, RAndom k-labELsets with random forest as the base classifier was employed to build the multilabel classifier, namely, iMPT-FDNPL. The tenfold cross-validation results indicated the good performance of such classifier. Furthermore, such classifier was superior to other classifiers based on features derived from functional domains via one-hot scheme or derived from other properties of proteins, suggesting the effectiveness of protein features generated by the proposed scheme. Hindawi 2021-10-11 /pmc/articles/PMC8523280/ /pubmed/34671418 http://dx.doi.org/10.1155/2021/7681497 Text en Copyright © 2021 Wei Chen et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Chen, Wei
Chen, Lei
Dai, Qi
iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach
title iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach
title_full iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach
title_fullStr iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach
title_full_unstemmed iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach
title_short iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach
title_sort impt-fdnpl: identification of membrane protein types with functional domains and a natural language processing approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8523280/
https://www.ncbi.nlm.nih.gov/pubmed/34671418
http://dx.doi.org/10.1155/2021/7681497
work_keys_str_mv AT chenwei imptfdnplidentificationofmembraneproteintypeswithfunctionaldomainsandanaturallanguageprocessingapproach
AT chenlei imptfdnplidentificationofmembraneproteintypeswithfunctionaldomainsandanaturallanguageprocessingapproach
AT daiqi imptfdnplidentificationofmembraneproteintypeswithfunctionaldomainsandanaturallanguageprocessingapproach