Cargando…

KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily

BACKGROUND: The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level rem...

Descripción completa

Detalles Bibliográficos
Autores principales: Pons, Tirso, Vazquez, Miguel, Matey-Hernandez, María Luisa, Brunak, Søren, Valencia, Alfonso, Izarzugaza, Jose MG
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4928150/
https://www.ncbi.nlm.nih.gov/pubmed/27357839
http://dx.doi.org/10.1186/s12864-016-2723-1
_version_ 1782440388706959360
author Pons, Tirso
Vazquez, Miguel
Matey-Hernandez, María Luisa
Brunak, Søren
Valencia, Alfonso
Izarzugaza, Jose MG
author_facet Pons, Tirso
Vazquez, Miguel
Matey-Hernandez, María Luisa
Brunak, Søren
Valencia, Alfonso
Izarzugaza, Jose MG
author_sort Pons, Tirso
collection PubMed
description BACKGROUND: The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease. RESULTS: KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified. A public implementation of KinMutRF, including documentation and examples, is available online (http://kinmut2.bioinfo.cnio.es). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2. CONCLUSIONS: KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2723-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4928150
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49281502016-06-30 KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily Pons, Tirso Vazquez, Miguel Matey-Hernandez, María Luisa Brunak, Søren Valencia, Alfonso Izarzugaza, Jose MG BMC Genomics Methodology Article BACKGROUND: The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease. RESULTS: KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified. A public implementation of KinMutRF, including documentation and examples, is available online (http://kinmut2.bioinfo.cnio.es). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2. CONCLUSIONS: KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2723-1) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-23 /pmc/articles/PMC4928150/ /pubmed/27357839 http://dx.doi.org/10.1186/s12864-016-2723-1 Text en © Pons et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Pons, Tirso
Vazquez, Miguel
Matey-Hernandez, María Luisa
Brunak, Søren
Valencia, Alfonso
Izarzugaza, Jose MG
KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily
title KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily
title_full KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily
title_fullStr KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily
title_full_unstemmed KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily
title_short KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily
title_sort kinmutrf: a random forest classifier of sequence variants in the human protein kinase superfamily
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4928150/
https://www.ncbi.nlm.nih.gov/pubmed/27357839
http://dx.doi.org/10.1186/s12864-016-2723-1
work_keys_str_mv AT ponstirso kinmutrfarandomforestclassifierofsequencevariantsinthehumanproteinkinasesuperfamily
AT vazquezmiguel kinmutrfarandomforestclassifierofsequencevariantsinthehumanproteinkinasesuperfamily
AT mateyhernandezmarialuisa kinmutrfarandomforestclassifierofsequencevariantsinthehumanproteinkinasesuperfamily
AT brunaksøren kinmutrfarandomforestclassifierofsequencevariantsinthehumanproteinkinasesuperfamily
AT valenciaalfonso kinmutrfarandomforestclassifierofsequencevariantsinthehumanproteinkinasesuperfamily
AT izarzugazajosemg kinmutrfarandomforestclassifierofsequencevariantsinthehumanproteinkinasesuperfamily