Cargando…

PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach

The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue...

Descripción completa

Detalles Bibliográficos
Autores principales: Chatterjee, Piyali, Basu, Subhadip, Zubek, Julian, Kundu, Mahantapas, Nasipuri, Mita, Plewczynski, Dariusz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788683/
https://www.ncbi.nlm.nih.gov/pubmed/26969678
http://dx.doi.org/10.1007/s00894-016-2933-0
_version_ 1782420751967584256
author Chatterjee, Piyali
Basu, Subhadip
Zubek, Julian
Kundu, Mahantapas
Nasipuri, Mita
Plewczynski, Dariusz
author_facet Chatterjee, Piyali
Basu, Subhadip
Zubek, Julian
Kundu, Mahantapas
Nasipuri, Mita
Plewczynski, Dariusz
author_sort Chatterjee, Piyali
collection PubMed
description The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers—decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron—were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00894-016-2933-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4788683
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-47886832016-04-09 PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach Chatterjee, Piyali Basu, Subhadip Zubek, Julian Kundu, Mahantapas Nasipuri, Mita Plewczynski, Dariusz J Mol Model Original Paper The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers—decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron—were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00894-016-2933-0) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2016-03-11 2016 /pmc/articles/PMC4788683/ /pubmed/26969678 http://dx.doi.org/10.1007/s00894-016-2933-0 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Original Paper
Chatterjee, Piyali
Basu, Subhadip
Zubek, Julian
Kundu, Mahantapas
Nasipuri, Mita
Plewczynski, Dariusz
PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach
title PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach
title_full PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach
title_fullStr PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach
title_full_unstemmed PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach
title_short PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach
title_sort pdp-con: prediction of domain/linker residues in protein sequences using a consensus approach
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788683/
https://www.ncbi.nlm.nih.gov/pubmed/26969678
http://dx.doi.org/10.1007/s00894-016-2933-0
work_keys_str_mv AT chatterjeepiyali pdpconpredictionofdomainlinkerresiduesinproteinsequencesusingaconsensusapproach
AT basusubhadip pdpconpredictionofdomainlinkerresiduesinproteinsequencesusingaconsensusapproach
AT zubekjulian pdpconpredictionofdomainlinkerresiduesinproteinsequencesusingaconsensusapproach
AT kundumahantapas pdpconpredictionofdomainlinkerresiduesinproteinsequencesusingaconsensusapproach
AT nasipurimita pdpconpredictionofdomainlinkerresiduesinproteinsequencesusingaconsensusapproach
AT plewczynskidariusz pdpconpredictionofdomainlinkerresiduesinproteinsequencesusingaconsensusapproach