Cargando…
VarBin, a novel method for classifying true and false positive variants in NGS data
BACKGROUND: Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3849648/ https://www.ncbi.nlm.nih.gov/pubmed/24266885 http://dx.doi.org/10.1186/1471-2105-14-S13-S2 |
_version_ | 1782293968432660480 |
---|---|
author | Durtschi, Jacob Margraf, Rebecca L Coonrod, Emily M Mallempati, Kalyan C Voelkerding, Karl V |
author_facet | Durtschi, Jacob Margraf, Rebecca L Coonrod, Emily M Mallempati, Kalyan C Voelkerding, Karl V |
author_sort | Durtschi, Jacob |
collection | PubMed |
description | BACKGROUND: Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction. METHODS: VarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild type/non-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild type/non-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild type/non-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4). RESULTS: To assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4. CONCLUSIONS: These data indicate that VarBin correctly classifies the majority of true variants as Bin 1 and Bin 3/4 contained only false positive variants. The "uncertain" Bin 2 contained both true and false positive variants. Future work will further differentiate the variants in Bin 2. |
format | Online Article Text |
id | pubmed-3849648 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38496482013-12-06 VarBin, a novel method for classifying true and false positive variants in NGS data Durtschi, Jacob Margraf, Rebecca L Coonrod, Emily M Mallempati, Kalyan C Voelkerding, Karl V BMC Bioinformatics Research BACKGROUND: Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction. METHODS: VarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild type/non-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild type/non-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild type/non-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4). RESULTS: To assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4. CONCLUSIONS: These data indicate that VarBin correctly classifies the majority of true variants as Bin 1 and Bin 3/4 contained only false positive variants. The "uncertain" Bin 2 contained both true and false positive variants. Future work will further differentiate the variants in Bin 2. BioMed Central 2013-10-01 /pmc/articles/PMC3849648/ /pubmed/24266885 http://dx.doi.org/10.1186/1471-2105-14-S13-S2 Text en Copyright © 2013 Durtschi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Durtschi, Jacob Margraf, Rebecca L Coonrod, Emily M Mallempati, Kalyan C Voelkerding, Karl V VarBin, a novel method for classifying true and false positive variants in NGS data |
title | VarBin, a novel method for classifying true and false positive variants in NGS data |
title_full | VarBin, a novel method for classifying true and false positive variants in NGS data |
title_fullStr | VarBin, a novel method for classifying true and false positive variants in NGS data |
title_full_unstemmed | VarBin, a novel method for classifying true and false positive variants in NGS data |
title_short | VarBin, a novel method for classifying true and false positive variants in NGS data |
title_sort | varbin, a novel method for classifying true and false positive variants in ngs data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3849648/ https://www.ncbi.nlm.nih.gov/pubmed/24266885 http://dx.doi.org/10.1186/1471-2105-14-S13-S2 |
work_keys_str_mv | AT durtschijacob varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata AT margrafrebeccal varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata AT coonrodemilym varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata AT mallempatikalyanc varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata AT voelkerdingkarlv varbinanovelmethodforclassifyingtrueandfalsepositivevariantsinngsdata |