Cargando…

VarBin, a novel method for classifying true and false positive variants in NGS data

BACKGROUND: Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Durtschi, Jacob, Margraf, Rebecca L, Coonrod, Emily M, Mallempati, Kalyan C, Voelkerding, Karl V
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3849648/ https://www.ncbi.nlm.nih.gov/pubmed/24266885 http://dx.doi.org/10.1186/1471-2105-14-S13-S2

_version_	1782293968432660480
author	Durtschi, Jacob Margraf, Rebecca L Coonrod, Emily M Mallempati, Kalyan C Voelkerding, Karl V
author_facet	Durtschi, Jacob Margraf, Rebecca L Coonrod, Emily M Mallempati, Kalyan C Voelkerding, Karl V
author_sort	Durtschi, Jacob
collection	PubMed
description	BACKGROUND: Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction. METHODS: VarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild type/non-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild type/non-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild type/non-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4). RESULTS: To assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4. CONCLUSIONS: These data indicate that VarBin correctly classifies the majority of true variants as Bin 1 and Bin 3/4 contained only false positive variants. The "uncertain" Bin 2 contained both true and false positive variants. Future work will further differentiate the variants in Bin 2.
format	Online Article Text
id	pubmed-3849648
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-38496482013-12-06 VarBin, a novel method for classifying true and false positive variants in NGS data Durtschi, Jacob Margraf, Rebecca L Coonrod, Emily M Mallempati, Kalyan C Voelkerding, Karl V BMC Bioinformatics Research BACKGROUND: Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction. METHODS: VarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild type/non-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild type/non-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild type/non-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4). RESULTS: To assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4. CONCLUSIONS: These data indicate that VarBin correctly classifies the majority of true variants as Bin 1 and Bin 3/4 contained only false positive variants. The "uncertain" Bin 2 contained both true and false positive variants. Future work will further differentiate the variants in Bin 2. BioMed Central 2013-10-01 /pmc/articles/PMC3849648/ /pubmed/24266885 http://dx.doi.org/10.1186/1471-2105-14-S13-S2 Text en Copyright © 2013 Durtschi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Durtschi, Jacob Margraf, Rebecca L Coonrod, Emily M Mallempati, Kalyan C Voelkerding, Karl V VarBin, a novel method for classifying true and false positive variants in NGS data
title	VarBin, a novel method for classifying true and false positive variants in NGS data
title_full	VarBin, a novel method for classifying true and false positive variants in NGS data
title_fullStr	VarBin, a novel method for classifying true and false positive variants in NGS data
title_full_unstemmed	VarBin, a novel method for classifying true and false positive variants in NGS data
title_short	VarBin, a novel method for classifying true and false positive variants in NGS data
title_sort	varbin, a novel method for classifying true and false positive variants in ngs data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3849648/ https://www.ncbi.nlm.nih.gov/pubmed/24266885 http://dx.doi.org/10.1186/1471-2105-14-S13-S2
work_keys_str_mv	AT durtschijacob varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata AT margrafrebeccal varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata AT coonrodemilym varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata AT mallempatikalyanc varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata AT voelkerdingkarlv varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata

VarBin, a novel method for classifying true and false positive variants in NGS data

Ejemplares similares