Cargando…
EAGLE: Explicit Alternative Genome Likelihood Evaluator
BACKGROUND: Reliable detection of genome variations, especially insertions and deletions (indels), from single sample DNA sequencing data remains challenging, partially due to the inherent uncertainty involved in aligning sequencing reads to the reference genome. In practice a variety of ad hoc qual...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5918433/ https://www.ncbi.nlm.nih.gov/pubmed/29697369 http://dx.doi.org/10.1186/s12920-018-0342-1 |
_version_ | 1783317415021510656 |
---|---|
author | Kuo, Tony Frith, Martin C. Sese, Jun Horton, Paul |
author_facet | Kuo, Tony Frith, Martin C. Sese, Jun Horton, Paul |
author_sort | Kuo, Tony |
collection | PubMed |
description | BACKGROUND: Reliable detection of genome variations, especially insertions and deletions (indels), from single sample DNA sequencing data remains challenging, partially due to the inherent uncertainty involved in aligning sequencing reads to the reference genome. In practice a variety of ad hoc quality filtering methods are employed to produce more reliable lists of putative variants, but the resulting lists typically still include numerous false positives. Thus it would be desirable to be able to rigorously evaluate the degree to which each putative variant is supported by the data. Unfortunately, users who wish to do this, e.g. for the purpose of prioritizing validation experiments, have been faced with limited options. RESULTS: Here we present EAGLE, a method for evaluating the degree to which sequencing data supports a given candidate genome variant. EAGLE incorporates candidate variants into explicit hypotheses about the individual’s genome, and then computes the probability of the observed data (the sequencing reads) under each hypothesis. In comparison with methods which rely heavily on a particular alignment of the reads to the reference genome, EAGLE readily accounts for uncertainties that may arise from multi-mapping or local misalignment and uses the entire length of each read. We compared the scores assigned by several well-known variant callers to EAGLE for the task of ranking true putative variants on both simulated data and real genome sequencing based benchmarks. For indels, EAGLE obtained marked improvement on simulated data and a whole genome sequencing benchmark, and modest but statistically significant improvement on an exome sequencing benchmark. CONCLUSIONS: EAGLE ranked true variants higher than the scores reported by the callers and can used to improve specificity in variant calling. EAGLE is freely available at https://github.com/tony-kuo/eagle. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0342-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5918433 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-59184332018-04-30 EAGLE: Explicit Alternative Genome Likelihood Evaluator Kuo, Tony Frith, Martin C. Sese, Jun Horton, Paul BMC Med Genomics Research BACKGROUND: Reliable detection of genome variations, especially insertions and deletions (indels), from single sample DNA sequencing data remains challenging, partially due to the inherent uncertainty involved in aligning sequencing reads to the reference genome. In practice a variety of ad hoc quality filtering methods are employed to produce more reliable lists of putative variants, but the resulting lists typically still include numerous false positives. Thus it would be desirable to be able to rigorously evaluate the degree to which each putative variant is supported by the data. Unfortunately, users who wish to do this, e.g. for the purpose of prioritizing validation experiments, have been faced with limited options. RESULTS: Here we present EAGLE, a method for evaluating the degree to which sequencing data supports a given candidate genome variant. EAGLE incorporates candidate variants into explicit hypotheses about the individual’s genome, and then computes the probability of the observed data (the sequencing reads) under each hypothesis. In comparison with methods which rely heavily on a particular alignment of the reads to the reference genome, EAGLE readily accounts for uncertainties that may arise from multi-mapping or local misalignment and uses the entire length of each read. We compared the scores assigned by several well-known variant callers to EAGLE for the task of ranking true putative variants on both simulated data and real genome sequencing based benchmarks. For indels, EAGLE obtained marked improvement on simulated data and a whole genome sequencing benchmark, and modest but statistically significant improvement on an exome sequencing benchmark. CONCLUSIONS: EAGLE ranked true variants higher than the scores reported by the callers and can used to improve specificity in variant calling. EAGLE is freely available at https://github.com/tony-kuo/eagle. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0342-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-04-20 /pmc/articles/PMC5918433/ /pubmed/29697369 http://dx.doi.org/10.1186/s12920-018-0342-1 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Kuo, Tony Frith, Martin C. Sese, Jun Horton, Paul EAGLE: Explicit Alternative Genome Likelihood Evaluator |
title | EAGLE: Explicit Alternative Genome Likelihood Evaluator |
title_full | EAGLE: Explicit Alternative Genome Likelihood Evaluator |
title_fullStr | EAGLE: Explicit Alternative Genome Likelihood Evaluator |
title_full_unstemmed | EAGLE: Explicit Alternative Genome Likelihood Evaluator |
title_short | EAGLE: Explicit Alternative Genome Likelihood Evaluator |
title_sort | eagle: explicit alternative genome likelihood evaluator |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5918433/ https://www.ncbi.nlm.nih.gov/pubmed/29697369 http://dx.doi.org/10.1186/s12920-018-0342-1 |
work_keys_str_mv | AT kuotony eagleexplicitalternativegenomelikelihoodevaluator AT frithmartinc eagleexplicitalternativegenomelikelihoodevaluator AT sesejun eagleexplicitalternativegenomelikelihoodevaluator AT hortonpaul eagleexplicitalternativegenomelikelihoodevaluator |