Cargando…
Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery
BACKGROUND: The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7385871/ https://www.ncbi.nlm.nih.gov/pubmed/32718320 http://dx.doi.org/10.1186/s13059-020-02105-0 |
_version_ | 1783563855859810304 |
---|---|
author | Crysnanto, Danang Pausch, Hubert |
author_facet | Crysnanto, Danang Pausch, Hubert |
author_sort | Crysnanto, Danang |
collection | PubMed |
description | BACKGROUND: The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. RESULTS: We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. CONCLUSIONS: We develop the first variation-aware reference graph for an agricultural animal (10.5281/zenodo.3759712). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations. |
format | Online Article Text |
id | pubmed-7385871 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73858712020-07-30 Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery Crysnanto, Danang Pausch, Hubert Genome Biol Research BACKGROUND: The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. RESULTS: We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. CONCLUSIONS: We develop the first variation-aware reference graph for an agricultural animal (10.5281/zenodo.3759712). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations. BioMed Central 2020-07-27 /pmc/articles/PMC7385871/ /pubmed/32718320 http://dx.doi.org/10.1186/s13059-020-02105-0 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Crysnanto, Danang Pausch, Hubert Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery |
title | Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery |
title_full | Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery |
title_fullStr | Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery |
title_full_unstemmed | Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery |
title_short | Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery |
title_sort | bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7385871/ https://www.ncbi.nlm.nih.gov/pubmed/32718320 http://dx.doi.org/10.1186/s13059-020-02105-0 |
work_keys_str_mv | AT crysnantodanang bovinebreedspecificaugmentedreferencegraphsfacilitateaccuratesequencereadmappingandunbiasedvariantdiscovery AT pauschhubert bovinebreedspecificaugmentedreferencegraphsfacilitateaccuratesequencereadmappingandunbiasedvariantdiscovery |