Cargando…

DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification

Improvements in speed and cost of genome sequencing are resulting in increasing numbers of novel non-synonymous single nucleotide polymorphisms (nsSNPs) in genes known to be associated with disease. The large number of nsSNPs makes laboratory-based classification infeasible and familial co-segregati...

Descripción completa

Detalles Bibliográficos
Autores principales: Schlosberg, Arran, Lam, Brian Y. H., Yeo, Giles S. H., Clifton-Bligh, Roderick J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Molecular Diversity Preservation International (MDPI) 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4057744/
https://www.ncbi.nlm.nih.gov/pubmed/24828207
http://dx.doi.org/10.3390/ijms15058491
_version_ 1782321023362793472
author Schlosberg, Arran
Lam, Brian Y. H.
Yeo, Giles S. H.
Clifton-Bligh, Roderick J.
author_facet Schlosberg, Arran
Lam, Brian Y. H.
Yeo, Giles S. H.
Clifton-Bligh, Roderick J.
author_sort Schlosberg, Arran
collection PubMed
description Improvements in speed and cost of genome sequencing are resulting in increasing numbers of novel non-synonymous single nucleotide polymorphisms (nsSNPs) in genes known to be associated with disease. The large number of nsSNPs makes laboratory-based classification infeasible and familial co-segregation with disease is not always possible. In-silico methods for classification or triage are thus utilised. A popular tool based on multiple-species sequence alignments (MSAs) and work by Grantham, Align-GVGD, has been shown to underestimate deleterious effects, particularly as sequence numbers increase. We utilised the DEFLATE compression algorithm to account for expected variation across a number of species. With the adjusted Grantham measure we derived a means of quantitatively clustering known neutral and deleterious nsSNPs from the same gene; this was then used to assign novel variants to the most appropriate cluster as a means of binary classification. Scaling of clusters allows for inter-gene comparison of variants through a single pathogenicity score. The approach improves upon the classification accuracy of Align-GVGD while correcting for sensitivity to large MSAs. Open-source code and a web server are made available at https://github.com/aschlosberg/CompressGV.
format Online
Article
Text
id pubmed-4057744
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Molecular Diversity Preservation International (MDPI)
record_format MEDLINE/PubMed
spelling pubmed-40577442014-06-16 DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification Schlosberg, Arran Lam, Brian Y. H. Yeo, Giles S. H. Clifton-Bligh, Roderick J. Int J Mol Sci Article Improvements in speed and cost of genome sequencing are resulting in increasing numbers of novel non-synonymous single nucleotide polymorphisms (nsSNPs) in genes known to be associated with disease. The large number of nsSNPs makes laboratory-based classification infeasible and familial co-segregation with disease is not always possible. In-silico methods for classification or triage are thus utilised. A popular tool based on multiple-species sequence alignments (MSAs) and work by Grantham, Align-GVGD, has been shown to underestimate deleterious effects, particularly as sequence numbers increase. We utilised the DEFLATE compression algorithm to account for expected variation across a number of species. With the adjusted Grantham measure we derived a means of quantitatively clustering known neutral and deleterious nsSNPs from the same gene; this was then used to assign novel variants to the most appropriate cluster as a means of binary classification. Scaling of clusters allows for inter-gene comparison of variants through a single pathogenicity score. The approach improves upon the classification accuracy of Align-GVGD while correcting for sensitivity to large MSAs. Open-source code and a web server are made available at https://github.com/aschlosberg/CompressGV. Molecular Diversity Preservation International (MDPI) 2014-05-13 /pmc/articles/PMC4057744/ /pubmed/24828207 http://dx.doi.org/10.3390/ijms15058491 Text en © 2014 by the authors; licensee MDPI, Basel, Switzerland http://creativecommons.org/licenses/by/3.0/ This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Article
Schlosberg, Arran
Lam, Brian Y. H.
Yeo, Giles S. H.
Clifton-Bligh, Roderick J.
DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification
title DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification
title_full DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification
title_fullStr DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification
title_full_unstemmed DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification
title_short DEFLATE Compression Algorithm Corrects for Overestimation of Phylogenetic Diversity by Grantham Approach to Single-Nucleotide Polymorphism Classification
title_sort deflate compression algorithm corrects for overestimation of phylogenetic diversity by grantham approach to single-nucleotide polymorphism classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4057744/
https://www.ncbi.nlm.nih.gov/pubmed/24828207
http://dx.doi.org/10.3390/ijms15058491
work_keys_str_mv AT schlosbergarran deflatecompressionalgorithmcorrectsforoverestimationofphylogeneticdiversitybygranthamapproachtosinglenucleotidepolymorphismclassification
AT lambrianyh deflatecompressionalgorithmcorrectsforoverestimationofphylogeneticdiversitybygranthamapproachtosinglenucleotidepolymorphismclassification
AT yeogilessh deflatecompressionalgorithmcorrectsforoverestimationofphylogeneticdiversitybygranthamapproachtosinglenucleotidepolymorphismclassification
AT cliftonblighroderickj deflatecompressionalgorithmcorrectsforoverestimationofphylogeneticdiversitybygranthamapproachtosinglenucleotidepolymorphismclassification