Cargando…
Accurate sequence variant genotyping in cattle using variation-aware genome graphs
BACKGROUND: Genotyping of sequence variants typically involves, as a first step, the alignment of sequencing reads to a linear reference genome. Because a linear reference genome represents only a small fraction of all the DNA sequence variation within a species, reference allele bias may occur at h...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6521551/ https://www.ncbi.nlm.nih.gov/pubmed/31092189 http://dx.doi.org/10.1186/s12711-019-0462-x |
_version_ | 1783418984908980224 |
---|---|
author | Crysnanto, Danang Wurmser, Christine Pausch, Hubert |
author_facet | Crysnanto, Danang Wurmser, Christine Pausch, Hubert |
author_sort | Crysnanto, Danang |
collection | PubMed |
description | BACKGROUND: Genotyping of sequence variants typically involves, as a first step, the alignment of sequencing reads to a linear reference genome. Because a linear reference genome represents only a small fraction of all the DNA sequence variation within a species, reference allele bias may occur at highly polymorphic or divergent regions of the genome. Graph-based methods facilitate the comparison of sequencing reads to a variation-aware genome graph, which incorporates a collection of non-redundant DNA sequences that segregate within a species. We compared the accuracy and sensitivity of graph-based sequence variant genotyping using the Graphtyper software to two widely-used methods, i.e., GATK and SAMtools, which rely on linear reference genomes using whole-genome sequencing data from 49 Original Braunvieh cattle. RESULTS: We discovered 21,140,196, 20,262,913, and 20,668,459 polymorphic sites using GATK, Graphtyper, and SAMtools, respectively. Comparisons between sequence variant genotypes and microarray-derived genotypes showed that Graphtyper outperformed both GATK and SAMtools in terms of genotype concordance, non-reference sensitivity, and non-reference discrepancy. The sequence variant genotypes that were obtained using Graphtyper had the smallest number of Mendelian inconsistencies between sequence-derived single nucleotide polymorphisms and indels in nine sire-son pairs. Genotype phasing and imputation using the Beagle software improved the quality of the sequence variant genotypes for all the tools evaluated, particularly for animals that were sequenced at low coverage. Following imputation, the concordance between sequence- and microarray-derived genotypes was almost identical for the three methods evaluated, i.e., 99.32, 99.46, and 99.24% for GATK, Graphtyper, and SAMtools, respectively. Variant filtration based on commonly used criteria improved genotype concordance slightly but it also decreased sensitivity. Graphtyper required considerably more computing resources than SAMtools but less than GATK. CONCLUSIONS: Sequence variant genotyping using Graphtyper is accurate, sensitive and computationally feasible in cattle. Graph-based methods enable sequence variant genotyping from variation-aware reference genomes that may incorporate cohort-specific sequence variants, which is not possible with the current implementation of state-of-the-art methods that rely on linear reference genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-019-0462-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6521551 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-65215512019-05-23 Accurate sequence variant genotyping in cattle using variation-aware genome graphs Crysnanto, Danang Wurmser, Christine Pausch, Hubert Genet Sel Evol Research Article BACKGROUND: Genotyping of sequence variants typically involves, as a first step, the alignment of sequencing reads to a linear reference genome. Because a linear reference genome represents only a small fraction of all the DNA sequence variation within a species, reference allele bias may occur at highly polymorphic or divergent regions of the genome. Graph-based methods facilitate the comparison of sequencing reads to a variation-aware genome graph, which incorporates a collection of non-redundant DNA sequences that segregate within a species. We compared the accuracy and sensitivity of graph-based sequence variant genotyping using the Graphtyper software to two widely-used methods, i.e., GATK and SAMtools, which rely on linear reference genomes using whole-genome sequencing data from 49 Original Braunvieh cattle. RESULTS: We discovered 21,140,196, 20,262,913, and 20,668,459 polymorphic sites using GATK, Graphtyper, and SAMtools, respectively. Comparisons between sequence variant genotypes and microarray-derived genotypes showed that Graphtyper outperformed both GATK and SAMtools in terms of genotype concordance, non-reference sensitivity, and non-reference discrepancy. The sequence variant genotypes that were obtained using Graphtyper had the smallest number of Mendelian inconsistencies between sequence-derived single nucleotide polymorphisms and indels in nine sire-son pairs. Genotype phasing and imputation using the Beagle software improved the quality of the sequence variant genotypes for all the tools evaluated, particularly for animals that were sequenced at low coverage. Following imputation, the concordance between sequence- and microarray-derived genotypes was almost identical for the three methods evaluated, i.e., 99.32, 99.46, and 99.24% for GATK, Graphtyper, and SAMtools, respectively. Variant filtration based on commonly used criteria improved genotype concordance slightly but it also decreased sensitivity. Graphtyper required considerably more computing resources than SAMtools but less than GATK. CONCLUSIONS: Sequence variant genotyping using Graphtyper is accurate, sensitive and computationally feasible in cattle. Graph-based methods enable sequence variant genotyping from variation-aware reference genomes that may incorporate cohort-specific sequence variants, which is not possible with the current implementation of state-of-the-art methods that rely on linear reference genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12711-019-0462-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-15 /pmc/articles/PMC6521551/ /pubmed/31092189 http://dx.doi.org/10.1186/s12711-019-0462-x Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Crysnanto, Danang Wurmser, Christine Pausch, Hubert Accurate sequence variant genotyping in cattle using variation-aware genome graphs |
title | Accurate sequence variant genotyping in cattle using variation-aware genome graphs |
title_full | Accurate sequence variant genotyping in cattle using variation-aware genome graphs |
title_fullStr | Accurate sequence variant genotyping in cattle using variation-aware genome graphs |
title_full_unstemmed | Accurate sequence variant genotyping in cattle using variation-aware genome graphs |
title_short | Accurate sequence variant genotyping in cattle using variation-aware genome graphs |
title_sort | accurate sequence variant genotyping in cattle using variation-aware genome graphs |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6521551/ https://www.ncbi.nlm.nih.gov/pubmed/31092189 http://dx.doi.org/10.1186/s12711-019-0462-x |
work_keys_str_mv | AT crysnantodanang accuratesequencevariantgenotypingincattleusingvariationawaregenomegraphs AT wurmserchristine accuratesequencevariantgenotypingincattleusingvariationawaregenomegraphs AT pauschhubert accuratesequencevariantgenotypingincattleusingvariationawaregenomegraphs |