Cargando…

Predicting deleterious nsSNPs: an analysis of sequence and structural attributes

BACKGROUND: There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We des...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dobson, Richard J, Munroe, Patricia B, Caulfield, Mark J, Saqi, Mansoor AS
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1489951/ https://www.ncbi.nlm.nih.gov/pubmed/16630345 http://dx.doi.org/10.1186/1471-2105-7-217

_version_	1782128369600561152
author	Dobson, Richard J Munroe, Patricia B Caulfield, Mark J Saqi, Mansoor AS
author_facet	Dobson, Richard J Munroe, Patricia B Caulfield, Mark J Saqi, Mansoor AS
author_sort	Dobson, Richard J
collection	PubMed
description	BACKGROUND: There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl. RESULTS: The measure of prediction success is greatly affected by the level of imbalance in the training dataset. We found the balanced dataset that included all attributes produced the best prediction. The performance as measured by the Matthews correlation coefficient (MCC) varied between 0.49 and 0.25 depending on the imbalance. As previously observed, the degree of sequence conservation at the nsSNP position is the single most useful attribute. In addition to conservation, structural predictions made using a balanced dataset can be of value. CONCLUSION: The predictions for all nsSNPs within Ensembl, based on a balanced dataset using all attributes, are available as a DAS annotation. Instructions for adding the track to Ensembl are at
format	Text
id	pubmed-1489951
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-14899512006-07-10 Predicting deleterious nsSNPs: an analysis of sequence and structural attributes Dobson, Richard J Munroe, Patricia B Caulfield, Mark J Saqi, Mansoor AS BMC Bioinformatics Methodology Article BACKGROUND: There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl. RESULTS: The measure of prediction success is greatly affected by the level of imbalance in the training dataset. We found the balanced dataset that included all attributes produced the best prediction. The performance as measured by the Matthews correlation coefficient (MCC) varied between 0.49 and 0.25 depending on the imbalance. As previously observed, the degree of sequence conservation at the nsSNP position is the single most useful attribute. In addition to conservation, structural predictions made using a balanced dataset can be of value. CONCLUSION: The predictions for all nsSNPs within Ensembl, based on a balanced dataset using all attributes, are available as a DAS annotation. Instructions for adding the track to Ensembl are at BioMed Central 2006-04-21 /pmc/articles/PMC1489951/ /pubmed/16630345 http://dx.doi.org/10.1186/1471-2105-7-217 Text en Copyright © 2006 Dobson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Dobson, Richard J Munroe, Patricia B Caulfield, Mark J Saqi, Mansoor AS Predicting deleterious nsSNPs: an analysis of sequence and structural attributes
title	Predicting deleterious nsSNPs: an analysis of sequence and structural attributes
title_full	Predicting deleterious nsSNPs: an analysis of sequence and structural attributes
title_fullStr	Predicting deleterious nsSNPs: an analysis of sequence and structural attributes
title_full_unstemmed	Predicting deleterious nsSNPs: an analysis of sequence and structural attributes
title_short	Predicting deleterious nsSNPs: an analysis of sequence and structural attributes
title_sort	predicting deleterious nssnps: an analysis of sequence and structural attributes
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1489951/ https://www.ncbi.nlm.nih.gov/pubmed/16630345 http://dx.doi.org/10.1186/1471-2105-7-217
work_keys_str_mv	AT dobsonrichardj predictingdeleteriousnssnpsananalysisofsequenceandstructuralattributes AT munroepatriciab predictingdeleteriousnssnpsananalysisofsequenceandstructuralattributes AT caulfieldmarkj predictingdeleteriousnssnpsananalysisofsequenceandstructuralattributes AT saqimansooras predictingdeleteriousnssnpsananalysisofsequenceandstructuralattributes

Predicting deleterious nsSNPs: an analysis of sequence and structural attributes

Ejemplares similares