Cargando…

CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein

BACKGROUND: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these...

Descripción completa

Detalles Bibliográficos
Autores principales: Terui, Hiroko, Akagi, Kiwamu, Kawame, Hiroshi, Yura, Kei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3651391/
https://www.ncbi.nlm.nih.gov/pubmed/23621914
http://dx.doi.org/10.1186/1423-0127-20-25
_version_ 1782269219740581888
author Terui, Hiroko
Akagi, Kiwamu
Kawame, Hiroshi
Yura, Kei
author_facet Terui, Hiroko
Akagi, Kiwamu
Kawame, Hiroshi
Yura, Kei
author_sort Terui, Hiroko
collection PubMed
description BACKGROUND: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these MMR genes. Because of this high frequency, the genetic testing was not effectively used in MSH6 so far. We, therefore, developed CoDP (Combination of the Different Properties), a bioinformatics tool to predict the impact of missense variants in MSH6. METHODS: We integrated the prediction results of three methods, namely MAPP, PolyPhen-2 and SIFT. Two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. MSH6 germline missense variants classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The performance of CoDP was compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR. RESULTS: A total of 294 germline missense variants were collected from the variant databases and literature. Of them, 34 variants were available for the parameter training and the prediction performance test. We integrated the prediction results of MAPP, PolyPhen-2 and SIFT, and two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. Variants data classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The values of the positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy of the tools were compared on the whole data set. PPV of CoDP was 93.3% (14/15), NPV was 94.7% (18/19), specificity was 94.7% (18/19), sensitivity was 93.3% (14/15) and accuracy was 94.1% (32/34). Area under the curve of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between pathogenic and non-pathogenic variants of these methods was tested by Wilcoxon rank sum test (p < 8.9 × 10(-6) for CoDP, p < 3.3 × 10(-5) for MAPP, p < 3.1 × 10(-4) for SIFT and p < 1.2 × 10(-3) for PolyPhen-2 HumVar), and CoDP was shown to outperform other conventional methods. CONCLUSION: In this paper, we provide a human curated data set for MSH6 missense variants, and CoDP, the prediction tool, which achieved better accuracy for predicting the impact of missense variants in MSH6 than any other known tools. CoDP is available at http://cib.cf.ocha.ac.jp/CoDP/.
format Online
Article
Text
id pubmed-3651391
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36513912013-05-14 CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein Terui, Hiroko Akagi, Kiwamu Kawame, Hiroshi Yura, Kei J Biomed Sci Research BACKGROUND: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these MMR genes. Because of this high frequency, the genetic testing was not effectively used in MSH6 so far. We, therefore, developed CoDP (Combination of the Different Properties), a bioinformatics tool to predict the impact of missense variants in MSH6. METHODS: We integrated the prediction results of three methods, namely MAPP, PolyPhen-2 and SIFT. Two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. MSH6 germline missense variants classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The performance of CoDP was compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR. RESULTS: A total of 294 germline missense variants were collected from the variant databases and literature. Of them, 34 variants were available for the parameter training and the prediction performance test. We integrated the prediction results of MAPP, PolyPhen-2 and SIFT, and two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. Variants data classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The values of the positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy of the tools were compared on the whole data set. PPV of CoDP was 93.3% (14/15), NPV was 94.7% (18/19), specificity was 94.7% (18/19), sensitivity was 93.3% (14/15) and accuracy was 94.1% (32/34). Area under the curve of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between pathogenic and non-pathogenic variants of these methods was tested by Wilcoxon rank sum test (p < 8.9 × 10(-6) for CoDP, p < 3.3 × 10(-5) for MAPP, p < 3.1 × 10(-4) for SIFT and p < 1.2 × 10(-3) for PolyPhen-2 HumVar), and CoDP was shown to outperform other conventional methods. CONCLUSION: In this paper, we provide a human curated data set for MSH6 missense variants, and CoDP, the prediction tool, which achieved better accuracy for predicting the impact of missense variants in MSH6 than any other known tools. CoDP is available at http://cib.cf.ocha.ac.jp/CoDP/. BioMed Central 2013-04-28 /pmc/articles/PMC3651391/ /pubmed/23621914 http://dx.doi.org/10.1186/1423-0127-20-25 Text en Copyright © 2013 Terui et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Terui, Hiroko
Akagi, Kiwamu
Kawame, Hiroshi
Yura, Kei
CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein
title CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein
title_full CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein
title_fullStr CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein
title_full_unstemmed CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein
title_short CoDP: predicting the impact of unclassified genetic variants in MSH6 by the combination of different properties of the protein
title_sort codp: predicting the impact of unclassified genetic variants in msh6 by the combination of different properties of the protein
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3651391/
https://www.ncbi.nlm.nih.gov/pubmed/23621914
http://dx.doi.org/10.1186/1423-0127-20-25
work_keys_str_mv AT teruihiroko codppredictingtheimpactofunclassifiedgeneticvariantsinmsh6bythecombinationofdifferentpropertiesoftheprotein
AT akagikiwamu codppredictingtheimpactofunclassifiedgeneticvariantsinmsh6bythecombinationofdifferentpropertiesoftheprotein
AT kawamehiroshi codppredictingtheimpactofunclassifiedgeneticvariantsinmsh6bythecombinationofdifferentpropertiesoftheprotein
AT yurakei codppredictingtheimpactofunclassifiedgeneticvariantsinmsh6bythecombinationofdifferentpropertiesoftheprotein