Cargando…

Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery

BACKGROUND: The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of...

Descripción completa

Detalles Bibliográficos
Autores principales: Crysnanto, Danang, Pausch, Hubert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7385871/
https://www.ncbi.nlm.nih.gov/pubmed/32718320
http://dx.doi.org/10.1186/s13059-020-02105-0
_version_ 1783563855859810304
author Crysnanto, Danang
Pausch, Hubert
author_facet Crysnanto, Danang
Pausch, Hubert
author_sort Crysnanto, Danang
collection PubMed
description BACKGROUND: The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. RESULTS: We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. CONCLUSIONS: We develop the first variation-aware reference graph for an agricultural animal (10.5281/zenodo.3759712). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations.
format Online
Article
Text
id pubmed-7385871
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73858712020-07-30 Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery Crysnanto, Danang Pausch, Hubert Genome Biol Research BACKGROUND: The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. RESULTS: We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. CONCLUSIONS: We develop the first variation-aware reference graph for an agricultural animal (10.5281/zenodo.3759712). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations. BioMed Central 2020-07-27 /pmc/articles/PMC7385871/ /pubmed/32718320 http://dx.doi.org/10.1186/s13059-020-02105-0 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Crysnanto, Danang
Pausch, Hubert
Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
title Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
title_full Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
title_fullStr Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
title_full_unstemmed Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
title_short Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
title_sort bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7385871/
https://www.ncbi.nlm.nih.gov/pubmed/32718320
http://dx.doi.org/10.1186/s13059-020-02105-0
work_keys_str_mv AT crysnantodanang bovinebreedspecificaugmentedreferencegraphsfacilitateaccuratesequencereadmappingandunbiasedvariantdiscovery
AT pauschhubert bovinebreedspecificaugmentedreferencegraphsfacilitateaccuratesequencereadmappingandunbiasedvariantdiscovery