Cargando…
DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification
Improvements in speed and cost of genome sequencing are resulting in increasing numbers of novel non-synonymous single nucleotide polymorphisms (nsSNPs) in genes known to be associated with disease. The large number of nsSNPs makes laboratory-based classification infeasible and familial co-segregati...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Molecular Diversity Preservation International (MDPI)
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4057744/ https://www.ncbi.nlm.nih.gov/pubmed/24828207 http://dx.doi.org/10.3390/ijms15058491 |
_version_ | 1782321023362793472 |
---|---|
author | Schlosberg, Arran Lam, Brian Y. H. Yeo, Giles S. H. Clifton-Bligh, Roderick J. |
author_facet | Schlosberg, Arran Lam, Brian Y. H. Yeo, Giles S. H. Clifton-Bligh, Roderick J. |
author_sort | Schlosberg, Arran |
collection | PubMed |
description | Improvements in speed and cost of genome sequencing are resulting in increasing numbers of novel non-synonymous single nucleotide polymorphisms (nsSNPs) in genes known to be associated with disease. The large number of nsSNPs makes laboratory-based classification infeasible and familial co-segregation with disease is not always possible. In-silico methods for classification or triage are thus utilised. A popular tool based on multiple-species sequence alignments (MSAs) and work by Grantham, Align-GVGD, has been shown to underestimate deleterious effects, particularly as sequence numbers increase. We utilised the DEFLATE compression algorithm to account for expected variation across a number of species. With the adjusted Grantham measure we derived a means of quantitatively clustering known neutral and deleterious nsSNPs from the same gene; this was then used to assign novel variants to the most appropriate cluster as a means of binary classification. Scaling of clusters allows for inter-gene comparison of variants through a single pathogenicity score. The approach improves upon the classification accuracy of Align-GVGD while correcting for sensitivity to large MSAs. Open-source code and a web server are made available at https://github.com/aschlosberg/CompressGV. |
format | Online Article Text |
id | pubmed-4057744 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Molecular Diversity Preservation International (MDPI) |
record_format | MEDLINE/PubMed |
spelling | pubmed-40577442014-06-16 DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification Schlosberg, Arran Lam, Brian Y. H. Yeo, Giles S. H. Clifton-Bligh, Roderick J. Int J Mol Sci Article Improvements in speed and cost of genome sequencing are resulting in increasing numbers of novel non-synonymous single nucleotide polymorphisms (nsSNPs) in genes known to be associated with disease. The large number of nsSNPs makes laboratory-based classification infeasible and familial co-segregation with disease is not always possible. In-silico methods for classification or triage are thus utilised. A popular tool based on multiple-species sequence alignments (MSAs) and work by Grantham, Align-GVGD, has been shown to underestimate deleterious effects, particularly as sequence numbers increase. We utilised the DEFLATE compression algorithm to account for expected variation across a number of species. With the adjusted Grantham measure we derived a means of quantitatively clustering known neutral and deleterious nsSNPs from the same gene; this was then used to assign novel variants to the most appropriate cluster as a means of binary classification. Scaling of clusters allows for inter-gene comparison of variants through a single pathogenicity score. The approach improves upon the classification accuracy of Align-GVGD while correcting for sensitivity to large MSAs. Open-source code and a web server are made available at https://github.com/aschlosberg/CompressGV. Molecular Diversity Preservation International (MDPI) 2014-05-13 /pmc/articles/PMC4057744/ /pubmed/24828207 http://dx.doi.org/10.3390/ijms15058491 Text en © 2014 by the authors; licensee MDPI, Basel, Switzerland http://creativecommons.org/licenses/by/3.0/ This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/). |
spellingShingle | Article Schlosberg, Arran Lam, Brian Y. H. Yeo, Giles S. H. Clifton-Bligh, Roderick J. DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification |
title | DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification |
title_full | DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification |
title_fullStr | DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification |
title_full_unstemmed | DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification |
title_short | DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification |
title_sort | deflate compression algorithm corrects for overestimation of phylogenetic diversity by grantham approach to single-nucleotide polymorphism classification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4057744/ https://www.ncbi.nlm.nih.gov/pubmed/24828207 http://dx.doi.org/10.3390/ijms15058491 |
work_keys_str_mv | AT schlosbergarran deflatecompressionalgorithmcorrectsforoverestimationofphylogeneticdiversitybygranthamapproachtosinglenucleotidepolymorphismclassification AT lambrianyh deflatecompressionalgorithmcorrectsforoverestimationofphylogeneticdiversitybygranthamapproachtosinglenucleotidepolymorphismclassification AT yeogilessh deflatecompressionalgorithmcorrectsforoverestimationofphylogeneticdiversitybygranthamapproachtosinglenucleotidepolymorphismclassification AT cliftonblighroderickj deflatecompressionalgorithmcorrectsforoverestimationofphylogeneticdiversitybygranthamapproachtosinglenucleotidepolymorphismclassification |