Cargando…

The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation

MOTIVATION: Pangenomes provide novel insights for population and quantitative genetics, genomics and breeding not available from studying a single reference genome. Instead, a species is better represented by a pangenome or collection of genomes. Unfortunately, managing and using pangenomes for geno...

Descripción completa

Detalles Bibliográficos
Autores principales: Bradbury, P J, Casstevens, T, Jensen, S E, Johnson, L C, Miller, Z R, Monier, B, Romay, M C, Song, B, Buckler, E S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344836/
https://www.ncbi.nlm.nih.gov/pubmed/35748708
http://dx.doi.org/10.1093/bioinformatics/btac410
_version_ 1784761302201139200
author Bradbury, P J
Casstevens, T
Jensen, S E
Johnson, L C
Miller, Z R
Monier, B
Romay, M C
Song, B
Buckler, E S
author_facet Bradbury, P J
Casstevens, T
Jensen, S E
Johnson, L C
Miller, Z R
Monier, B
Romay, M C
Song, B
Buckler, E S
author_sort Bradbury, P J
collection PubMed
description MOTIVATION: Pangenomes provide novel insights for population and quantitative genetics, genomics and breeding not available from studying a single reference genome. Instead, a species is better represented by a pangenome or collection of genomes. Unfortunately, managing and using pangenomes for genomically diverse species is computationally and practically challenging. We developed a trellis graph representation anchored to the reference genome that represents most pangenomes well and can be used to impute complete genomes from low density sequence or variant data. RESULTS: The Practical Haplotype Graph (PHG) is a pangenome pipeline, database (PostGRES & SQLite), data model (Java, Kotlin or R) and Breeding API (BrAPI) web service. The PHG has already been able to accurately represent diversity in four major crops including maize, one of the most genomically diverse species, with up to 1000-fold data compression. Using simulated data, we show that, at even 0.1× coverage, with appropriate reads and sequence alignment, imputation results in extremely accurate haplotype reconstruction. The PHG is a platform and environment for the understanding and application of genomic diversity. AVAILABILITY AND IMPLEMENTATION: All resources listed here are freely available. The PHG Docker used to generate the simulation results is https://hub.docker.com/ as maizegenetics/phg:0.0.27. PHG source code is at https://bitbucket.org/bucklerlab/practicalhaplotypegraph/src/master/. The code used for the analysis of simulated data is at https://bitbucket.org/bucklerlab/phg-manuscript/src/master/. The PHG database of NAM parent haplotypes is in the CyVerse data store (https://de.cyverse.org/de/) and named/iplant/home/shared/panzea/panGenome/PHG_db_maize/phg_v5Assemblies_20200608.db. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9344836
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93448362022-08-03 The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation Bradbury, P J Casstevens, T Jensen, S E Johnson, L C Miller, Z R Monier, B Romay, M C Song, B Buckler, E S Bioinformatics Original Papers MOTIVATION: Pangenomes provide novel insights for population and quantitative genetics, genomics and breeding not available from studying a single reference genome. Instead, a species is better represented by a pangenome or collection of genomes. Unfortunately, managing and using pangenomes for genomically diverse species is computationally and practically challenging. We developed a trellis graph representation anchored to the reference genome that represents most pangenomes well and can be used to impute complete genomes from low density sequence or variant data. RESULTS: The Practical Haplotype Graph (PHG) is a pangenome pipeline, database (PostGRES & SQLite), data model (Java, Kotlin or R) and Breeding API (BrAPI) web service. The PHG has already been able to accurately represent diversity in four major crops including maize, one of the most genomically diverse species, with up to 1000-fold data compression. Using simulated data, we show that, at even 0.1× coverage, with appropriate reads and sequence alignment, imputation results in extremely accurate haplotype reconstruction. The PHG is a platform and environment for the understanding and application of genomic diversity. AVAILABILITY AND IMPLEMENTATION: All resources listed here are freely available. The PHG Docker used to generate the simulation results is https://hub.docker.com/ as maizegenetics/phg:0.0.27. PHG source code is at https://bitbucket.org/bucklerlab/practicalhaplotypegraph/src/master/. The code used for the analysis of simulated data is at https://bitbucket.org/bucklerlab/phg-manuscript/src/master/. The PHG database of NAM parent haplotypes is in the CyVerse data store (https://de.cyverse.org/de/) and named/iplant/home/shared/panzea/panGenome/PHG_db_maize/phg_v5Assemblies_20200608.db. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-24 /pmc/articles/PMC9344836/ /pubmed/35748708 http://dx.doi.org/10.1093/bioinformatics/btac410 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Bradbury, P J
Casstevens, T
Jensen, S E
Johnson, L C
Miller, Z R
Monier, B
Romay, M C
Song, B
Buckler, E S
The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation
title The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation
title_full The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation
title_fullStr The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation
title_full_unstemmed The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation
title_short The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation
title_sort practical haplotype graph, a platform for storing and using pangenomes for imputation
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344836/
https://www.ncbi.nlm.nih.gov/pubmed/35748708
http://dx.doi.org/10.1093/bioinformatics/btac410
work_keys_str_mv AT bradburypj thepracticalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT casstevenst thepracticalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT jensense thepracticalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT johnsonlc thepracticalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT millerzr thepracticalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT monierb thepracticalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT romaymc thepracticalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT songb thepracticalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT buckleres thepracticalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT bradburypj practicalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT casstevenst practicalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT jensense practicalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT johnsonlc practicalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT millerzr practicalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT monierb practicalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT romaymc practicalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT songb practicalhaplotypegraphaplatformforstoringandusingpangenomesforimputation
AT buckleres practicalhaplotypegraphaplatformforstoringandusingpangenomesforimputation