Cargando…

EagleImp: fast and accurate genome-wide phasing and imputation in a single tool

MOTIVATION: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. RESULTS: We developed EagleImp, a software based on the methods used in...

Descripción completa

Detalles Bibliográficos
Autores principales: Wienbrandt, Lars, Ellinghaus, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9665855/
https://www.ncbi.nlm.nih.gov/pubmed/36130053
http://dx.doi.org/10.1093/bioinformatics/btac637
_version_ 1784831377702649856
author Wienbrandt, Lars
Ellinghaus, David
author_facet Wienbrandt, Lars
Ellinghaus, David
author_sort Wienbrandt, Lars
collection PubMed
description MOTIVATION: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. RESULTS: We developed EagleImp, a software based on the methods used in the existing tools Eagle2 and PBWT, which allows accurate and accelerated phasing and imputation in a single tool by algorithmic and technical improvements and new features. We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with 1 million reference genomes. EagleImp was 2–30 times faster (depending on the single or multiprocessor configuration selected and the size of the reference panel) than Eagle2 combined with PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical genome-wide association studies, EagleImp provided same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. Additional features include automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files and various user-configurable algorithm and output options. Due to the technical optimizations, EagleImp can perform fast and accurate reference-based phasing and imputation and is ready for future large reference panels in the order of 1 million genomes. AVAILABILITY AND IMPLEMENTATION: EagleImp is implemented in C++ and freely available for download at https://github.com/ikmb/eagleimp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9665855
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96658552022-11-16 EagleImp: fast and accurate genome-wide phasing and imputation in a single tool Wienbrandt, Lars Ellinghaus, David Bioinformatics Original Papers MOTIVATION: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. RESULTS: We developed EagleImp, a software based on the methods used in the existing tools Eagle2 and PBWT, which allows accurate and accelerated phasing and imputation in a single tool by algorithmic and technical improvements and new features. We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with 1 million reference genomes. EagleImp was 2–30 times faster (depending on the single or multiprocessor configuration selected and the size of the reference panel) than Eagle2 combined with PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical genome-wide association studies, EagleImp provided same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. Additional features include automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files and various user-configurable algorithm and output options. Due to the technical optimizations, EagleImp can perform fast and accurate reference-based phasing and imputation and is ready for future large reference panels in the order of 1 million genomes. AVAILABILITY AND IMPLEMENTATION: EagleImp is implemented in C++ and freely available for download at https://github.com/ikmb/eagleimp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-09-20 /pmc/articles/PMC9665855/ /pubmed/36130053 http://dx.doi.org/10.1093/bioinformatics/btac637 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Wienbrandt, Lars
Ellinghaus, David
EagleImp: fast and accurate genome-wide phasing and imputation in a single tool
title EagleImp: fast and accurate genome-wide phasing and imputation in a single tool
title_full EagleImp: fast and accurate genome-wide phasing and imputation in a single tool
title_fullStr EagleImp: fast and accurate genome-wide phasing and imputation in a single tool
title_full_unstemmed EagleImp: fast and accurate genome-wide phasing and imputation in a single tool
title_short EagleImp: fast and accurate genome-wide phasing and imputation in a single tool
title_sort eagleimp: fast and accurate genome-wide phasing and imputation in a single tool
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9665855/
https://www.ncbi.nlm.nih.gov/pubmed/36130053
http://dx.doi.org/10.1093/bioinformatics/btac637
work_keys_str_mv AT wienbrandtlars eagleimpfastandaccurategenomewidephasingandimputationinasingletool
AT ellinghausdavid eagleimpfastandaccurategenomewidephasingandimputationinasingletool