Cargando…

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes

Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, whic...

Descripción completa

Detalles Bibliográficos
Autores principales: Ebler, Jana, Ebert, Peter, Clarke, Wayne E., Rausch, Tobias, Audano, Peter A., Houwaart, Torsten, Mao, Yafei, Korbel, Jan O., Eichler, Evan E., Zody, Michael C., Dilthey, Alexander T., Marschall, Tobias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group US 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9005351/
https://www.ncbi.nlm.nih.gov/pubmed/35410384
http://dx.doi.org/10.1038/s41588-022-01043-w
_version_ 1784686440386396160
author Ebler, Jana
Ebert, Peter
Clarke, Wayne E.
Rausch, Tobias
Audano, Peter A.
Houwaart, Torsten
Mao, Yafei
Korbel, Jan O.
Eichler, Evan E.
Zody, Michael C.
Dilthey, Alexander T.
Marschall, Tobias
author_facet Ebler, Jana
Ebert, Peter
Clarke, Wayne E.
Rausch, Tobias
Audano, Peter A.
Houwaart, Torsten
Mao, Yafei
Korbel, Jan O.
Eichler, Evan E.
Zody, Michael C.
Dilthey, Alexander T.
Marschall, Tobias
author_sort Ebler, Jana
collection PubMed
description Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation—a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.
format Online
Article
Text
id pubmed-9005351
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group US
record_format MEDLINE/PubMed
spelling pubmed-90053512022-04-27 Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes Ebler, Jana Ebert, Peter Clarke, Wayne E. Rausch, Tobias Audano, Peter A. Houwaart, Torsten Mao, Yafei Korbel, Jan O. Eichler, Evan E. Zody, Michael C. Dilthey, Alexander T. Marschall, Tobias Nat Genet Technical Report Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation—a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows. Nature Publishing Group US 2022-04-11 2022 /pmc/articles/PMC9005351/ /pubmed/35410384 http://dx.doi.org/10.1038/s41588-022-01043-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Technical Report
Ebler, Jana
Ebert, Peter
Clarke, Wayne E.
Rausch, Tobias
Audano, Peter A.
Houwaart, Torsten
Mao, Yafei
Korbel, Jan O.
Eichler, Evan E.
Zody, Michael C.
Dilthey, Alexander T.
Marschall, Tobias
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
title Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
title_full Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
title_fullStr Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
title_full_unstemmed Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
title_short Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
title_sort pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
topic Technical Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9005351/
https://www.ncbi.nlm.nih.gov/pubmed/35410384
http://dx.doi.org/10.1038/s41588-022-01043-w
work_keys_str_mv AT eblerjana pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT ebertpeter pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT clarkewaynee pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT rauschtobias pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT audanopetera pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT houwaarttorsten pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT maoyafei pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT korbeljano pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT eichlerevane pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT zodymichaelc pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT diltheyalexandert pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses
AT marschalltobias pangenomebasedgenomeinferenceallowsefficientandaccurategenotypingacrossawidespectrumofvariantclasses