Cargando…

DeepFam: deep learning based alignment-free method for protein family modeling and prediction

MOTIVATION: A large number of newly sequenced proteins are generated by the next-generation sequencing technologies and the biochemical function assignment of the proteins is an important task. However, biological experiments are too expensive to characterize such a large number of protein sequences...

Descripción completa

Detalles Bibliográficos
Autores principales: Seo, Seokjun, Oh, Minsik, Park, Youngjune, Kim, Sun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022622/
https://www.ncbi.nlm.nih.gov/pubmed/29949966
http://dx.doi.org/10.1093/bioinformatics/bty275
_version_ 1783335717789761536
author Seo, Seokjun
Oh, Minsik
Park, Youngjune
Kim, Sun
author_facet Seo, Seokjun
Oh, Minsik
Park, Youngjune
Kim, Sun
author_sort Seo, Seokjun
collection PubMed
description MOTIVATION: A large number of newly sequenced proteins are generated by the next-generation sequencing technologies and the biochemical function assignment of the proteins is an important task. However, biological experiments are too expensive to characterize such a large number of protein sequences, thus protein function prediction is primarily done by computational modeling methods, such as profile Hidden Markov Model (pHMM) and k-mer based methods. Nevertheless, existing methods have some limitations; k-mer based methods are not accurate enough to assign protein functions and pHMM is not fast enough to handle large number of protein sequences from numerous genome projects. Therefore, a more accurate and faster protein function prediction method is needed. RESULTS: In this paper, we introduce DeepFam, an alignment-free method that can extract functional information directly from sequences without the need of multiple sequence alignments. In extensive experiments using the Clusters of Orthologous Groups (COGs) and G protein-coupled receptor (GPCR) dataset, DeepFam achieved better performance in terms of accuracy and runtime for predicting functions of proteins compared to the state-of-the-art methods, both alignment-free and alignment-based methods. Additionally, we showed that DeepFam has a power of capturing conserved regions to model protein families. In fact, DeepFam was able to detect conserved regions documented in the Prosite database while predicting functions of proteins. Our deep learning method will be useful in characterizing functions of the ever increasing protein sequences. AVAILABILITY AND IMPLEMENTATION: Codes are available at https://bhi-kimlab.github.io/DeepFam.
format Online
Article
Text
id pubmed-6022622
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60226222018-07-10 DeepFam: deep learning based alignment-free method for protein family modeling and prediction Seo, Seokjun Oh, Minsik Park, Youngjune Kim, Sun Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: A large number of newly sequenced proteins are generated by the next-generation sequencing technologies and the biochemical function assignment of the proteins is an important task. However, biological experiments are too expensive to characterize such a large number of protein sequences, thus protein function prediction is primarily done by computational modeling methods, such as profile Hidden Markov Model (pHMM) and k-mer based methods. Nevertheless, existing methods have some limitations; k-mer based methods are not accurate enough to assign protein functions and pHMM is not fast enough to handle large number of protein sequences from numerous genome projects. Therefore, a more accurate and faster protein function prediction method is needed. RESULTS: In this paper, we introduce DeepFam, an alignment-free method that can extract functional information directly from sequences without the need of multiple sequence alignments. In extensive experiments using the Clusters of Orthologous Groups (COGs) and G protein-coupled receptor (GPCR) dataset, DeepFam achieved better performance in terms of accuracy and runtime for predicting functions of proteins compared to the state-of-the-art methods, both alignment-free and alignment-based methods. Additionally, we showed that DeepFam has a power of capturing conserved regions to model protein families. In fact, DeepFam was able to detect conserved regions documented in the Prosite database while predicting functions of proteins. Our deep learning method will be useful in characterizing functions of the ever increasing protein sequences. AVAILABILITY AND IMPLEMENTATION: Codes are available at https://bhi-kimlab.github.io/DeepFam. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022622/ /pubmed/29949966 http://dx.doi.org/10.1093/bioinformatics/bty275 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Seo, Seokjun
Oh, Minsik
Park, Youngjune
Kim, Sun
DeepFam: deep learning based alignment-free method for protein family modeling and prediction
title DeepFam: deep learning based alignment-free method for protein family modeling and prediction
title_full DeepFam: deep learning based alignment-free method for protein family modeling and prediction
title_fullStr DeepFam: deep learning based alignment-free method for protein family modeling and prediction
title_full_unstemmed DeepFam: deep learning based alignment-free method for protein family modeling and prediction
title_short DeepFam: deep learning based alignment-free method for protein family modeling and prediction
title_sort deepfam: deep learning based alignment-free method for protein family modeling and prediction
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022622/
https://www.ncbi.nlm.nih.gov/pubmed/29949966
http://dx.doi.org/10.1093/bioinformatics/bty275
work_keys_str_mv AT seoseokjun deepfamdeeplearningbasedalignmentfreemethodforproteinfamilymodelingandprediction
AT ohminsik deepfamdeeplearningbasedalignmentfreemethodforproteinfamilymodelingandprediction
AT parkyoungjune deepfamdeeplearningbasedalignmentfreemethodforproteinfamilymodelingandprediction
AT kimsun deepfamdeeplearningbasedalignmentfreemethodforproteinfamilymodelingandprediction