Cargando…

Comparing methods for constructing and representing human pangenome graphs

BACKGROUND: As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collec...

Descripción completa

Detalles Bibliográficos
Autores principales: Andreace, Francesco, Lechat, Pierre, Dufresne, Yoann, Chikhi, Rayan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691155/
https://www.ncbi.nlm.nih.gov/pubmed/38037131
http://dx.doi.org/10.1186/s13059-023-03098-2
_version_ 1785152683334696960
author Andreace, Francesco
Lechat, Pierre
Dufresne, Yoann
Chikhi, Rayan
author_facet Andreace, Francesco
Lechat, Pierre
Dufresne, Yoann
Chikhi, Rayan
author_sort Andreace, Francesco
collection PubMed
description BACKGROUND: As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs. RESULTS: In this work, we collect all publicly available high-quality human haplotypes and construct the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: Bifrost, mdbg, Minigraph, Minigraph-Cactus and pggb. We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci. CONCLUSION: This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-03098-2.
format Online
Article
Text
id pubmed-10691155
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106911552023-12-02 Comparing methods for constructing and representing human pangenome graphs Andreace, Francesco Lechat, Pierre Dufresne, Yoann Chikhi, Rayan Genome Biol Research BACKGROUND: As a single reference genome cannot possibly represent all the variation present across human individuals, pangenome graphs have been introduced to incorporate population diversity within a wide range of genomic analyses. Several data structures have been proposed for representing collections of genomes as pangenomes, in particular graphs. RESULTS: In this work, we collect all publicly available high-quality human haplotypes and construct the largest human pangenome graphs to date, incorporating 52 individuals in addition to two synthetic references (CHM13 and GRCh38). We build variation graphs and de Bruijn graphs of this collection using five of the state-of-the-art tools: Bifrost, mdbg, Minigraph, Minigraph-Cactus and pggb. We examine differences in the way each of these tools represents variations between input sequences, both in terms of overall graph structure and representation of specific genetic loci. CONCLUSION: This work sheds light on key differences between pangenome graph representations, informing end-users on how to select the most appropriate graph type for their application. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-03098-2. BioMed Central 2023-11-30 /pmc/articles/PMC10691155/ /pubmed/38037131 http://dx.doi.org/10.1186/s13059-023-03098-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Andreace, Francesco
Lechat, Pierre
Dufresne, Yoann
Chikhi, Rayan
Comparing methods for constructing and representing human pangenome graphs
title Comparing methods for constructing and representing human pangenome graphs
title_full Comparing methods for constructing and representing human pangenome graphs
title_fullStr Comparing methods for constructing and representing human pangenome graphs
title_full_unstemmed Comparing methods for constructing and representing human pangenome graphs
title_short Comparing methods for constructing and representing human pangenome graphs
title_sort comparing methods for constructing and representing human pangenome graphs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691155/
https://www.ncbi.nlm.nih.gov/pubmed/38037131
http://dx.doi.org/10.1186/s13059-023-03098-2
work_keys_str_mv AT andreacefrancesco comparingmethodsforconstructingandrepresentinghumanpangenomegraphs
AT lechatpierre comparingmethodsforconstructingandrepresentinghumanpangenomegraphs
AT dufresneyoann comparingmethodsforconstructingandrepresentinghumanpangenomegraphs
AT chikhirayan comparingmethodsforconstructingandrepresentinghumanpangenomegraphs