Cargando…

EAGLE: Explicit Alternative Genome Likelihood Evaluator

BACKGROUND: Reliable detection of genome variations, especially insertions and deletions (indels), from single sample DNA sequencing data remains challenging, partially due to the inherent uncertainty involved in aligning sequencing reads to the reference genome. In practice a variety of ad hoc qual...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuo, Tony, Frith, Martin C., Sese, Jun, Horton, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5918433/
https://www.ncbi.nlm.nih.gov/pubmed/29697369
http://dx.doi.org/10.1186/s12920-018-0342-1
_version_ 1783317415021510656
author Kuo, Tony
Frith, Martin C.
Sese, Jun
Horton, Paul
author_facet Kuo, Tony
Frith, Martin C.
Sese, Jun
Horton, Paul
author_sort Kuo, Tony
collection PubMed
description BACKGROUND: Reliable detection of genome variations, especially insertions and deletions (indels), from single sample DNA sequencing data remains challenging, partially due to the inherent uncertainty involved in aligning sequencing reads to the reference genome. In practice a variety of ad hoc quality filtering methods are employed to produce more reliable lists of putative variants, but the resulting lists typically still include numerous false positives. Thus it would be desirable to be able to rigorously evaluate the degree to which each putative variant is supported by the data. Unfortunately, users who wish to do this, e.g. for the purpose of prioritizing validation experiments, have been faced with limited options. RESULTS: Here we present EAGLE, a method for evaluating the degree to which sequencing data supports a given candidate genome variant. EAGLE incorporates candidate variants into explicit hypotheses about the individual’s genome, and then computes the probability of the observed data (the sequencing reads) under each hypothesis. In comparison with methods which rely heavily on a particular alignment of the reads to the reference genome, EAGLE readily accounts for uncertainties that may arise from multi-mapping or local misalignment and uses the entire length of each read. We compared the scores assigned by several well-known variant callers to EAGLE for the task of ranking true putative variants on both simulated data and real genome sequencing based benchmarks. For indels, EAGLE obtained marked improvement on simulated data and a whole genome sequencing benchmark, and modest but statistically significant improvement on an exome sequencing benchmark. CONCLUSIONS: EAGLE ranked true variants higher than the scores reported by the callers and can used to improve specificity in variant calling. EAGLE is freely available at https://github.com/tony-kuo/eagle. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0342-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5918433
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59184332018-04-30 EAGLE: Explicit Alternative Genome Likelihood Evaluator Kuo, Tony Frith, Martin C. Sese, Jun Horton, Paul BMC Med Genomics Research BACKGROUND: Reliable detection of genome variations, especially insertions and deletions (indels), from single sample DNA sequencing data remains challenging, partially due to the inherent uncertainty involved in aligning sequencing reads to the reference genome. In practice a variety of ad hoc quality filtering methods are employed to produce more reliable lists of putative variants, but the resulting lists typically still include numerous false positives. Thus it would be desirable to be able to rigorously evaluate the degree to which each putative variant is supported by the data. Unfortunately, users who wish to do this, e.g. for the purpose of prioritizing validation experiments, have been faced with limited options. RESULTS: Here we present EAGLE, a method for evaluating the degree to which sequencing data supports a given candidate genome variant. EAGLE incorporates candidate variants into explicit hypotheses about the individual’s genome, and then computes the probability of the observed data (the sequencing reads) under each hypothesis. In comparison with methods which rely heavily on a particular alignment of the reads to the reference genome, EAGLE readily accounts for uncertainties that may arise from multi-mapping or local misalignment and uses the entire length of each read. We compared the scores assigned by several well-known variant callers to EAGLE for the task of ranking true putative variants on both simulated data and real genome sequencing based benchmarks. For indels, EAGLE obtained marked improvement on simulated data and a whole genome sequencing benchmark, and modest but statistically significant improvement on an exome sequencing benchmark. CONCLUSIONS: EAGLE ranked true variants higher than the scores reported by the callers and can used to improve specificity in variant calling. EAGLE is freely available at https://github.com/tony-kuo/eagle. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0342-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-04-20 /pmc/articles/PMC5918433/ /pubmed/29697369 http://dx.doi.org/10.1186/s12920-018-0342-1 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Kuo, Tony
Frith, Martin C.
Sese, Jun
Horton, Paul
EAGLE: Explicit Alternative Genome Likelihood Evaluator
title EAGLE: Explicit Alternative Genome Likelihood Evaluator
title_full EAGLE: Explicit Alternative Genome Likelihood Evaluator
title_fullStr EAGLE: Explicit Alternative Genome Likelihood Evaluator
title_full_unstemmed EAGLE: Explicit Alternative Genome Likelihood Evaluator
title_short EAGLE: Explicit Alternative Genome Likelihood Evaluator
title_sort eagle: explicit alternative genome likelihood evaluator
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5918433/
https://www.ncbi.nlm.nih.gov/pubmed/29697369
http://dx.doi.org/10.1186/s12920-018-0342-1
work_keys_str_mv AT kuotony eagleexplicitalternativegenomelikelihoodevaluator
AT frithmartinc eagleexplicitalternativegenomelikelihoodevaluator
AT sesejun eagleexplicitalternativegenomelikelihoodevaluator
AT hortonpaul eagleexplicitalternativegenomelikelihoodevaluator