Cargando…
Predicting cancer-associated germline variations in proteins
BACKGROUND: Various computational methods are presently available to classify whether a protein variation is disease-associated or not. However data derived from recent technological advancements make it feasible to extend the annotation of disease-associated variations in order to include specific...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3372458/ https://www.ncbi.nlm.nih.gov/pubmed/22759656 http://dx.doi.org/10.1186/1471-2164-13-S4-S8 |
_version_ | 1782235349810937856 |
---|---|
author | Martelli, Pier Luigi Fariselli, Piero Balzani, Eva Casadio, Rita |
author_facet | Martelli, Pier Luigi Fariselli, Piero Balzani, Eva Casadio, Rita |
author_sort | Martelli, Pier Luigi |
collection | PubMed |
description | BACKGROUND: Various computational methods are presently available to classify whether a protein variation is disease-associated or not. However data derived from recent technological advancements make it feasible to extend the annotation of disease-associated variations in order to include specific phenotypes. Here we tackle the problem of distinguishing between genetic variations associated to cancer and variations associated to other genetic diseases. RESULTS: We implement a new method based on Support Vector Machines that takes as input the protein variant and the protein function, as described by its associated Gene Ontology terms. Our approach succeeds in discriminating between germline variants that are likely to be cancer-associated from those that are related to other genetic disorders. The method performs with values of 90% accuracy and 0.61 Matthews correlation coefficient on a set comprising 6478 germline variations (16% are cancer-associated) in 592 proteins. The sensitivity and the specificity on the cancer class are 69% and 66%, respectively. Furthermore the method is capable of correctly excluding some 96% of 3392 somatic cancer-associated variations in 1983 proteins not included in the training/testing set. CONCLUSIONS: Here we prove feasible that a large set of cancer associated germline protein variations can be successfully discriminated from those associated to other genetic disorders. This is a step further in the process of protein variant annotation. Scoring largely improves when protein function as encoded by Gene Ontology terms is considered, corroborating the role of protein function as a key feature for a correct annotation of its variations. |
format | Online Article Text |
id | pubmed-3372458 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-33724582012-06-13 Predicting cancer-associated germline variations in proteins Martelli, Pier Luigi Fariselli, Piero Balzani, Eva Casadio, Rita BMC Genomics Proceedings BACKGROUND: Various computational methods are presently available to classify whether a protein variation is disease-associated or not. However data derived from recent technological advancements make it feasible to extend the annotation of disease-associated variations in order to include specific phenotypes. Here we tackle the problem of distinguishing between genetic variations associated to cancer and variations associated to other genetic diseases. RESULTS: We implement a new method based on Support Vector Machines that takes as input the protein variant and the protein function, as described by its associated Gene Ontology terms. Our approach succeeds in discriminating between germline variants that are likely to be cancer-associated from those that are related to other genetic disorders. The method performs with values of 90% accuracy and 0.61 Matthews correlation coefficient on a set comprising 6478 germline variations (16% are cancer-associated) in 592 proteins. The sensitivity and the specificity on the cancer class are 69% and 66%, respectively. Furthermore the method is capable of correctly excluding some 96% of 3392 somatic cancer-associated variations in 1983 proteins not included in the training/testing set. CONCLUSIONS: Here we prove feasible that a large set of cancer associated germline protein variations can be successfully discriminated from those associated to other genetic disorders. This is a step further in the process of protein variant annotation. Scoring largely improves when protein function as encoded by Gene Ontology terms is considered, corroborating the role of protein function as a key feature for a correct annotation of its variations. BioMed Central 2012-06-18 /pmc/articles/PMC3372458/ /pubmed/22759656 http://dx.doi.org/10.1186/1471-2164-13-S4-S8 Text en Copyright ©2012 Martelli et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Martelli, Pier Luigi Fariselli, Piero Balzani, Eva Casadio, Rita Predicting cancer-associated germline variations in proteins |
title | Predicting cancer-associated germline variations in proteins |
title_full | Predicting cancer-associated germline variations in proteins |
title_fullStr | Predicting cancer-associated germline variations in proteins |
title_full_unstemmed | Predicting cancer-associated germline variations in proteins |
title_short | Predicting cancer-associated germline variations in proteins |
title_sort | predicting cancer-associated germline variations in proteins |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3372458/ https://www.ncbi.nlm.nih.gov/pubmed/22759656 http://dx.doi.org/10.1186/1471-2164-13-S4-S8 |
work_keys_str_mv | AT martellipierluigi predictingcancerassociatedgermlinevariationsinproteins AT farisellipiero predictingcancerassociatedgermlinevariationsinproteins AT balzanieva predictingcancerassociatedgermlinevariationsinproteins AT casadiorita predictingcancerassociatedgermlinevariationsinproteins |