Cargando…

Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties

BACKGROUND: Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate prediction of protein domain linkers and boundaries is often regarded as the initial step of protein tertiary str...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shatnawi, Maad, Zaki, Nazar, Yoo, Paul D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4290662/ https://www.ncbi.nlm.nih.gov/pubmed/25521329 http://dx.doi.org/10.1186/1471-2105-15-S16-S8

_version_	1782352283036549120
author	Shatnawi, Maad Zaki, Nazar Yoo, Paul D
author_facet	Shatnawi, Maad Zaki, Nazar Yoo, Paul D
author_sort	Shatnawi, Maad
collection	PubMed
description	BACKGROUND: Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate prediction of protein domain linkers and boundaries is often regarded as the initial step of protein tertiary structure and function predictions. Such information not only enhances protein-targeted drug development but also reduces the experimental cost of protein analysis by allowing researchers to work on a set of smaller and independent units. In this study, we propose a novel and accurate domain-linker prediction approach based on protein primary structure information only. We utilize a nature-inspired machine-learning model called Random Forest along with a novel domain-linker profile that contains physiochemical and domain-linker information of amino acid sequences. RESULTS: The proposed approach was tested on two well-known benchmark protein datasets and achieved 68% sensitivity and 99% precision, which is better than any existing protein domain-linker predictor. Without applying any data balancing technique such as class weighting and data re-sampling, the proposed approach is able to accurately classify inter-domain linkers from highly imbalanced datasets. CONCLUSION: Our experimental results prove that the proposed approach is useful for domain-linker identification in highly imbalanced single- and multi-domain proteins.
format	Online Article Text
id	pubmed-4290662
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42906622015-01-15 Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties Shatnawi, Maad Zaki, Nazar Yoo, Paul D BMC Bioinformatics Research BACKGROUND: Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate prediction of protein domain linkers and boundaries is often regarded as the initial step of protein tertiary structure and function predictions. Such information not only enhances protein-targeted drug development but also reduces the experimental cost of protein analysis by allowing researchers to work on a set of smaller and independent units. In this study, we propose a novel and accurate domain-linker prediction approach based on protein primary structure information only. We utilize a nature-inspired machine-learning model called Random Forest along with a novel domain-linker profile that contains physiochemical and domain-linker information of amino acid sequences. RESULTS: The proposed approach was tested on two well-known benchmark protein datasets and achieved 68% sensitivity and 99% precision, which is better than any existing protein domain-linker predictor. Without applying any data balancing technique such as class weighting and data re-sampling, the proposed approach is able to accurately classify inter-domain linkers from highly imbalanced datasets. CONCLUSION: Our experimental results prove that the proposed approach is useful for domain-linker identification in highly imbalanced single- and multi-domain proteins. BioMed Central 2014-12-08 /pmc/articles/PMC4290662/ /pubmed/25521329 http://dx.doi.org/10.1186/1471-2105-15-S16-S8 Text en Copyright © 2014 Shatnawi et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Shatnawi, Maad Zaki, Nazar Yoo, Paul D Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties
title	Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties
title_full	Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties
title_fullStr	Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties
title_full_unstemmed	Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties
title_short	Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties
title_sort	protein inter-domain linker prediction using random forest and amino acid physiochemical properties
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4290662/ https://www.ncbi.nlm.nih.gov/pubmed/25521329 http://dx.doi.org/10.1186/1471-2105-15-S16-S8
work_keys_str_mv	AT shatnawimaad proteininterdomainlinkerpredictionusingrandomforestandaminoacidphysiochemicalproperties AT zakinazar proteininterdomainlinkerpredictionusingrandomforestandaminoacidphysiochemicalproperties AT yoopauld proteininterdomainlinkerpredictionusingrandomforestandaminoacidphysiochemicalproperties

Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties

Ejemplares similares