Cargando…

Collective judgment predicts disease-associated single nucleotide variants

BACKGROUND: In recent years the number of human genetic variants deposited into the publicly available databases has been increasing exponentially. The latest version of dbSNP, for example, contains ~50 million validated Single Nucleotide Variants (SNVs). SNVs make up most of human variation and are...

Descripción completa

Detalles Bibliográficos
Autores principales: Capriotti, Emidio, Altman, Russ B, Bromberg, Yana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839641/
https://www.ncbi.nlm.nih.gov/pubmed/23819846
http://dx.doi.org/10.1186/1471-2164-14-S3-S2
_version_ 1782478433817722880
author Capriotti, Emidio
Altman, Russ B
Bromberg, Yana
author_facet Capriotti, Emidio
Altman, Russ B
Bromberg, Yana
author_sort Capriotti, Emidio
collection PubMed
description BACKGROUND: In recent years the number of human genetic variants deposited into the publicly available databases has been increasing exponentially. The latest version of dbSNP, for example, contains ~50 million validated Single Nucleotide Variants (SNVs). SNVs make up most of human variation and are often the primary causes of disease. The non-synonymous SNVs (nsSNVs) result in single amino acid substitutions and may affect protein function, often causing disease. Although several methods for the detection of nsSNV effects have already been developed, the consistent increase in annotated data is offering the opportunity to improve prediction accuracy. RESULTS: Here we present a new approach for the detection of disease-associated nsSNVs (Meta-SNP) that integrates four existing methods: PANTHER, PhD-SNP, SIFT and SNAP. We first tested the accuracy of each method using a dataset of 35,766 disease-annotated mutations from 8,667 proteins extracted from the SwissVar database. The four methods reached overall accuracies of 64%-76% with a Matthew's correlation coefficient (MCC) of 0.38-0.53. We then used the outputs of these methods to develop a machine learning based approach that discriminates between disease-associated and polymorphic variants (Meta-SNP). In testing, the combined method reached 79% overall accuracy and 0.59 MCC, ~3% higher accuracy and ~0.05 higher correlation with respect to the best-performing method. Moreover, for the hardest-to-define subset of nsSNVs, i.e. variants for which half of the predictors disagreed with the other half, Meta-SNP attained 8% higher accuracy than the best predictor. CONCLUSIONS: Here we find that the Meta-SNP algorithm achieves better performance than the best single predictor. This result suggests that the methods used for the prediction of variant-disease associations are orthogonal, encoding different biologically relevant relationships. Careful combination of predictions from various resources is therefore a good strategy for the selection of high reliability predictions. Indeed, for the subset of nsSNVs where all predictors were in agreement (46% of all nsSNVs in the set), our method reached 87% overall accuracy and 0.73 MCC. Meta-SNP server is freely accessible at http://snps.biofold.org/meta-snp.
format Online
Article
Text
id pubmed-3839641
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38396412013-12-03 Collective judgment predicts disease-associated single nucleotide variants Capriotti, Emidio Altman, Russ B Bromberg, Yana BMC Genomics Research BACKGROUND: In recent years the number of human genetic variants deposited into the publicly available databases has been increasing exponentially. The latest version of dbSNP, for example, contains ~50 million validated Single Nucleotide Variants (SNVs). SNVs make up most of human variation and are often the primary causes of disease. The non-synonymous SNVs (nsSNVs) result in single amino acid substitutions and may affect protein function, often causing disease. Although several methods for the detection of nsSNV effects have already been developed, the consistent increase in annotated data is offering the opportunity to improve prediction accuracy. RESULTS: Here we present a new approach for the detection of disease-associated nsSNVs (Meta-SNP) that integrates four existing methods: PANTHER, PhD-SNP, SIFT and SNAP. We first tested the accuracy of each method using a dataset of 35,766 disease-annotated mutations from 8,667 proteins extracted from the SwissVar database. The four methods reached overall accuracies of 64%-76% with a Matthew's correlation coefficient (MCC) of 0.38-0.53. We then used the outputs of these methods to develop a machine learning based approach that discriminates between disease-associated and polymorphic variants (Meta-SNP). In testing, the combined method reached 79% overall accuracy and 0.59 MCC, ~3% higher accuracy and ~0.05 higher correlation with respect to the best-performing method. Moreover, for the hardest-to-define subset of nsSNVs, i.e. variants for which half of the predictors disagreed with the other half, Meta-SNP attained 8% higher accuracy than the best predictor. CONCLUSIONS: Here we find that the Meta-SNP algorithm achieves better performance than the best single predictor. This result suggests that the methods used for the prediction of variant-disease associations are orthogonal, encoding different biologically relevant relationships. Careful combination of predictions from various resources is therefore a good strategy for the selection of high reliability predictions. Indeed, for the subset of nsSNVs where all predictors were in agreement (46% of all nsSNVs in the set), our method reached 87% overall accuracy and 0.73 MCC. Meta-SNP server is freely accessible at http://snps.biofold.org/meta-snp. BioMed Central 2013-05-28 /pmc/articles/PMC3839641/ /pubmed/23819846 http://dx.doi.org/10.1186/1471-2164-14-S3-S2 Text en Copyright © 2013 Capriotti et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Capriotti, Emidio
Altman, Russ B
Bromberg, Yana
Collective judgment predicts disease-associated single nucleotide variants
title Collective judgment predicts disease-associated single nucleotide variants
title_full Collective judgment predicts disease-associated single nucleotide variants
title_fullStr Collective judgment predicts disease-associated single nucleotide variants
title_full_unstemmed Collective judgment predicts disease-associated single nucleotide variants
title_short Collective judgment predicts disease-associated single nucleotide variants
title_sort collective judgment predicts disease-associated single nucleotide variants
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839641/
https://www.ncbi.nlm.nih.gov/pubmed/23819846
http://dx.doi.org/10.1186/1471-2164-14-S3-S2
work_keys_str_mv AT capriottiemidio collectivejudgmentpredictsdiseaseassociatedsinglenucleotidevariants
AT altmanrussb collectivejudgmentpredictsdiseaseassociatedsinglenucleotidevariants
AT brombergyana collectivejudgmentpredictsdiseaseassociatedsinglenucleotidevariants