Cargando…

Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations

BACKGROUND: In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the var...

Descripción completa

Detalles Bibliográficos
Autores principales:	Broeckx, Bart J. G., Peelman, Luc, Saunders, Jimmy H., Deforce, Dieter, Clement, Lieven
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5710091/ https://www.ncbi.nlm.nih.gov/pubmed/29191167 http://dx.doi.org/10.1186/s12859-017-1951-y

_version_	1783282914741452800
author	Broeckx, Bart J. G. Peelman, Luc Saunders, Jimmy H. Deforce, Dieter Clement, Lieven
author_facet	Broeckx, Bart J. G. Peelman, Luc Saunders, Jimmy H. Deforce, Dieter Clement, Lieven
author_sort	Broeckx, Bart J. G.
collection	PubMed
description	BACKGROUND: In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the variant database (called the absence-approach, i.e. it is assumed that disease-causing variants do not reside in variant databases) or by using the subset of variants with an allelic frequency > 1% (called the 1%-approach). We investigate the validity of these two approaches in terms of false negatives (the true disease-causing variant does not pass all filters) and false positives (a harmless mutation passes all filters and is erroneously retained in the list of putative disease-causing variants) and compare it with an novel approach which we named the quantile-based approach. This approach applies variable instead of static frequency thresholds and the calculation of these thresholds is based on prior knowledge of disease prevalence, inheritance models, database size and database characteristics. RESULTS: Based on real-life data, we demonstrate that the quantile-based approach outperforms the absence-approach in terms of false negatives. At the same time, this quantile-based approach deals more appropriately with the variable allele frequencies of disease-causing alleles in variant databases relative to the 1%-approach and as such allows a better control of the number of false positives. We also introduce an alternative application for variant database usage and the quantile-based approach. If disease-causing variants in variant databases deviate substantially from theoretical expectancies calculated with the quantile-based approach, their association between genotype and phenotype had to be reconsidered in 12 out of 13 cases. CONCLUSIONS: We developed a novel method and demonstrated that this so-called quantile-based approach is a highly suitable method for variant filtering. In addition, the quantile-based approach can also be used for variant flagging. For user friendliness, lookup tables and easy-to-use R calculators are provided. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi: 10.1186/s12859-017-1951-y) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5710091
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-57100912017-12-06 Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations Broeckx, Bart J. G. Peelman, Luc Saunders, Jimmy H. Deforce, Dieter Clement, Lieven BMC Bioinformatics Methodology Article BACKGROUND: In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the variant database (called the absence-approach, i.e. it is assumed that disease-causing variants do not reside in variant databases) or by using the subset of variants with an allelic frequency > 1% (called the 1%-approach). We investigate the validity of these two approaches in terms of false negatives (the true disease-causing variant does not pass all filters) and false positives (a harmless mutation passes all filters and is erroneously retained in the list of putative disease-causing variants) and compare it with an novel approach which we named the quantile-based approach. This approach applies variable instead of static frequency thresholds and the calculation of these thresholds is based on prior knowledge of disease prevalence, inheritance models, database size and database characteristics. RESULTS: Based on real-life data, we demonstrate that the quantile-based approach outperforms the absence-approach in terms of false negatives. At the same time, this quantile-based approach deals more appropriately with the variable allele frequencies of disease-causing alleles in variant databases relative to the 1%-approach and as such allows a better control of the number of false positives. We also introduce an alternative application for variant database usage and the quantile-based approach. If disease-causing variants in variant databases deviate substantially from theoretical expectancies calculated with the quantile-based approach, their association between genotype and phenotype had to be reconsidered in 12 out of 13 cases. CONCLUSIONS: We developed a novel method and demonstrated that this so-called quantile-based approach is a highly suitable method for variant filtering. In addition, the quantile-based approach can also be used for variant flagging. For user friendliness, lookup tables and easy-to-use R calculators are provided. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi: 10.1186/s12859-017-1951-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-01 /pmc/articles/PMC5710091/ /pubmed/29191167 http://dx.doi.org/10.1186/s12859-017-1951-y Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Broeckx, Bart J. G. Peelman, Luc Saunders, Jimmy H. Deforce, Dieter Clement, Lieven Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_full	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_fullStr	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_full_unstemmed	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_short	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_sort	using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5710091/ https://www.ncbi.nlm.nih.gov/pubmed/29191167 http://dx.doi.org/10.1186/s12859-017-1951-y
work_keys_str_mv	AT broeckxbartjg usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations AT peelmanluc usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations AT saundersjimmyh usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations AT deforcedieter usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations AT clementlieven usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations

Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations

Ejemplares similares