Cargando…

An improved deep learning model for hierarchical classification of protein families

Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and...

Descripción completa

Detalles Bibliográficos
Autores principales: Sandaruwan, Pahalage Dhanushka, Wannige, Champi Thusangi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528337/
https://www.ncbi.nlm.nih.gov/pubmed/34669708
http://dx.doi.org/10.1371/journal.pone.0258625
_version_ 1784586232829837312
author Sandaruwan, Pahalage Dhanushka
Wannige, Champi Thusangi
author_facet Sandaruwan, Pahalage Dhanushka
Wannige, Champi Thusangi
author_sort Sandaruwan, Pahalage Dhanushka
collection PubMed
description Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.
format Online
Article
Text
id pubmed-8528337
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-85283372021-10-21 An improved deep learning model for hierarchical classification of protein families Sandaruwan, Pahalage Dhanushka Wannige, Champi Thusangi PLoS One Research Article Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively. Public Library of Science 2021-10-20 /pmc/articles/PMC8528337/ /pubmed/34669708 http://dx.doi.org/10.1371/journal.pone.0258625 Text en © 2021 Sandaruwan, Wannige https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Sandaruwan, Pahalage Dhanushka
Wannige, Champi Thusangi
An improved deep learning model for hierarchical classification of protein families
title An improved deep learning model for hierarchical classification of protein families
title_full An improved deep learning model for hierarchical classification of protein families
title_fullStr An improved deep learning model for hierarchical classification of protein families
title_full_unstemmed An improved deep learning model for hierarchical classification of protein families
title_short An improved deep learning model for hierarchical classification of protein families
title_sort improved deep learning model for hierarchical classification of protein families
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528337/
https://www.ncbi.nlm.nih.gov/pubmed/34669708
http://dx.doi.org/10.1371/journal.pone.0258625
work_keys_str_mv AT sandaruwanpahalagedhanushka animproveddeeplearningmodelforhierarchicalclassificationofproteinfamilies
AT wannigechampithusangi animproveddeeplearningmodelforhierarchicalclassificationofproteinfamilies
AT sandaruwanpahalagedhanushka improveddeeplearningmodelforhierarchicalclassificationofproteinfamilies
AT wannigechampithusangi improveddeeplearningmodelforhierarchicalclassificationofproteinfamilies