Cargando…

Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information

BACKGROUND: As the number of non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single amino acid polymorphisms (SAPs), increases rapidly, computational methods that can distinguish disease-causing SAPs from neutral SAPs are needed. Many methods have been developed to distinguish...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hu, Jing, Yan, Changhui
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2446391/ https://www.ncbi.nlm.nih.gov/pubmed/18588693 http://dx.doi.org/10.1186/1471-2105-9-297

_version_	1782156864946962432
author	Hu, Jing Yan, Changhui
author_facet	Hu, Jing Yan, Changhui
author_sort	Hu, Jing
collection	PubMed
description	BACKGROUND: As the number of non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single amino acid polymorphisms (SAPs), increases rapidly, computational methods that can distinguish disease-causing SAPs from neutral SAPs are needed. Many methods have been developed to distinguish disease-causing SAPs based on both structural and sequence features of the mutation point. One limitation of these methods is that they are not applicable to the cases where protein structures are not available. In this study, we explore the feasibility of classifying SAPs into disease-causing and neutral mutations using only information derived from protein sequence. RESULTS: We compiled a set of 686 features that were derived from protein sequence. For each feature, the distance between the wild-type residue and mutant-type residue was computed. Then a greedy approach was used to select the features that were useful for the classification of SAPs. 10 features were selected. Using the selected features, a decision tree method can achieve 82.6% overall accuracy with 0.607 Matthews Correlation Coefficient (MCC) in cross-validation. When tested on an independent set that was not seen by the method during the training and feature selection, the decision tree method achieves 82.6% overall accuracy with 0.604 MCC. We also evaluated the proposed method on all SAPs obtained from the Swiss-Prot, the method achieves 0.42 MCC with 73.2% overall accuracy. This method allows users to make reliable predictions when protein structures are not available. Different from previous studies, in which only a small set of features were arbitrarily chosen and considered, here we used an automated method to systematically discover useful features from a large set of features well-annotated in public databases. CONCLUSION: The proposed method is a useful tool for the classification of SAPs, especially, when the structure of the protein is not available.
format	Text
id	pubmed-2446391
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24463912008-07-09 Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information Hu, Jing Yan, Changhui BMC Bioinformatics Research Article BACKGROUND: As the number of non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single amino acid polymorphisms (SAPs), increases rapidly, computational methods that can distinguish disease-causing SAPs from neutral SAPs are needed. Many methods have been developed to distinguish disease-causing SAPs based on both structural and sequence features of the mutation point. One limitation of these methods is that they are not applicable to the cases where protein structures are not available. In this study, we explore the feasibility of classifying SAPs into disease-causing and neutral mutations using only information derived from protein sequence. RESULTS: We compiled a set of 686 features that were derived from protein sequence. For each feature, the distance between the wild-type residue and mutant-type residue was computed. Then a greedy approach was used to select the features that were useful for the classification of SAPs. 10 features were selected. Using the selected features, a decision tree method can achieve 82.6% overall accuracy with 0.607 Matthews Correlation Coefficient (MCC) in cross-validation. When tested on an independent set that was not seen by the method during the training and feature selection, the decision tree method achieves 82.6% overall accuracy with 0.604 MCC. We also evaluated the proposed method on all SAPs obtained from the Swiss-Prot, the method achieves 0.42 MCC with 73.2% overall accuracy. This method allows users to make reliable predictions when protein structures are not available. Different from previous studies, in which only a small set of features were arbitrarily chosen and considered, here we used an automated method to systematically discover useful features from a large set of features well-annotated in public databases. CONCLUSION: The proposed method is a useful tool for the classification of SAPs, especially, when the structure of the protein is not available. BioMed Central 2008-06-27 /pmc/articles/PMC2446391/ /pubmed/18588693 http://dx.doi.org/10.1186/1471-2105-9-297 Text en Copyright © 2008 Hu and Yan; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Hu, Jing Yan, Changhui Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information
title	Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information
title_full	Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information
title_fullStr	Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information
title_full_unstemmed	Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information
title_short	Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information
title_sort	identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2446391/ https://www.ncbi.nlm.nih.gov/pubmed/18588693 http://dx.doi.org/10.1186/1471-2105-9-297
work_keys_str_mv	AT hujing identificationofdeleteriousnonsynonymoussinglenucleotidepolymorphismsusingsequencederivedinformation AT yanchanghui identificationofdeleteriousnonsynonymoussinglenucleotidepolymorphismsusingsequencederivedinformation

Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information

Ejemplares similares