Cargando…

Predicting cancer-associated germline variations in proteins

BACKGROUND: Various computational methods are presently available to classify whether a protein variation is disease-associated or not. However data derived from recent technological advancements make it feasible to extend the annotation of disease-associated variations in order to include specific...

Descripción completa

Detalles Bibliográficos
Autores principales: Martelli, Pier Luigi, Fariselli, Piero, Balzani, Eva, Casadio, Rita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3372458/
https://www.ncbi.nlm.nih.gov/pubmed/22759656
http://dx.doi.org/10.1186/1471-2164-13-S4-S8
_version_ 1782235349810937856
author Martelli, Pier Luigi
Fariselli, Piero
Balzani, Eva
Casadio, Rita
author_facet Martelli, Pier Luigi
Fariselli, Piero
Balzani, Eva
Casadio, Rita
author_sort Martelli, Pier Luigi
collection PubMed
description BACKGROUND: Various computational methods are presently available to classify whether a protein variation is disease-associated or not. However data derived from recent technological advancements make it feasible to extend the annotation of disease-associated variations in order to include specific phenotypes. Here we tackle the problem of distinguishing between genetic variations associated to cancer and variations associated to other genetic diseases. RESULTS: We implement a new method based on Support Vector Machines that takes as input the protein variant and the protein function, as described by its associated Gene Ontology terms. Our approach succeeds in discriminating between germline variants that are likely to be cancer-associated from those that are related to other genetic disorders. The method performs with values of 90% accuracy and 0.61 Matthews correlation coefficient on a set comprising 6478 germline variations (16% are cancer-associated) in 592 proteins. The sensitivity and the specificity on the cancer class are 69% and 66%, respectively. Furthermore the method is capable of correctly excluding some 96% of 3392 somatic cancer-associated variations in 1983 proteins not included in the training/testing set. CONCLUSIONS: Here we prove feasible that a large set of cancer associated germline protein variations can be successfully discriminated from those associated to other genetic disorders. This is a step further in the process of protein variant annotation. Scoring largely improves when protein function as encoded by Gene Ontology terms is considered, corroborating the role of protein function as a key feature for a correct annotation of its variations.
format Online
Article
Text
id pubmed-3372458
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33724582012-06-13 Predicting cancer-associated germline variations in proteins Martelli, Pier Luigi Fariselli, Piero Balzani, Eva Casadio, Rita BMC Genomics Proceedings BACKGROUND: Various computational methods are presently available to classify whether a protein variation is disease-associated or not. However data derived from recent technological advancements make it feasible to extend the annotation of disease-associated variations in order to include specific phenotypes. Here we tackle the problem of distinguishing between genetic variations associated to cancer and variations associated to other genetic diseases. RESULTS: We implement a new method based on Support Vector Machines that takes as input the protein variant and the protein function, as described by its associated Gene Ontology terms. Our approach succeeds in discriminating between germline variants that are likely to be cancer-associated from those that are related to other genetic disorders. The method performs with values of 90% accuracy and 0.61 Matthews correlation coefficient on a set comprising 6478 germline variations (16% are cancer-associated) in 592 proteins. The sensitivity and the specificity on the cancer class are 69% and 66%, respectively. Furthermore the method is capable of correctly excluding some 96% of 3392 somatic cancer-associated variations in 1983 proteins not included in the training/testing set. CONCLUSIONS: Here we prove feasible that a large set of cancer associated germline protein variations can be successfully discriminated from those associated to other genetic disorders. This is a step further in the process of protein variant annotation. Scoring largely improves when protein function as encoded by Gene Ontology terms is considered, corroborating the role of protein function as a key feature for a correct annotation of its variations. BioMed Central 2012-06-18 /pmc/articles/PMC3372458/ /pubmed/22759656 http://dx.doi.org/10.1186/1471-2164-13-S4-S8 Text en Copyright ©2012 Martelli et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Martelli, Pier Luigi
Fariselli, Piero
Balzani, Eva
Casadio, Rita
Predicting cancer-associated germline variations in proteins
title Predicting cancer-associated germline variations in proteins
title_full Predicting cancer-associated germline variations in proteins
title_fullStr Predicting cancer-associated germline variations in proteins
title_full_unstemmed Predicting cancer-associated germline variations in proteins
title_short Predicting cancer-associated germline variations in proteins
title_sort predicting cancer-associated germline variations in proteins
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3372458/
https://www.ncbi.nlm.nih.gov/pubmed/22759656
http://dx.doi.org/10.1186/1471-2164-13-S4-S8
work_keys_str_mv AT martellipierluigi predictingcancerassociatedgermlinevariationsinproteins
AT farisellipiero predictingcancerassociatedgermlinevariationsinproteins
AT balzanieva predictingcancerassociatedgermlinevariationsinproteins
AT casadiorita predictingcancerassociatedgermlinevariationsinproteins