Cargando…

Investigating the impact of reference assembly choice on genomic analyses in a cattle breed

BACKGROUND: Reference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exceed the c...

Descripción completa

Detalles Bibliográficos
Autores principales: Lloret-Villas, Audald, Bhati, Meenu, Kadri, Naveen Kumar, Fries, Ruedi, Pausch, Hubert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8132449/
https://www.ncbi.nlm.nih.gov/pubmed/34011274
http://dx.doi.org/10.1186/s12864-021-07554-w
_version_ 1783694916404117504
author Lloret-Villas, Audald
Bhati, Meenu
Kadri, Naveen Kumar
Fries, Ruedi
Pausch, Hubert
author_facet Lloret-Villas, Audald
Bhati, Meenu
Kadri, Naveen Kumar
Fries, Ruedi
Pausch, Hubert
author_sort Lloret-Villas, Audald
collection PubMed
description BACKGROUND: Reference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exceed the current bovine reference genome in quality and continuity have been assembled for different breeds of cattle. Using whole genome sequencing data of 161 Brown Swiss cattle, we compared the accuracy of read mapping and sequence variant genotyping as well as downstream genomic analyses between the bovine reference genome (ARS-UCD1.2) and a highly continuous Angus-based assembly (UOA_Angus_1). RESULTS: Read mapping accuracy did not differ notably between the ARS-UCD1.2 and UOA_Angus_1 assemblies. We discovered 22,744,517 and 22,559,675 high-quality variants from ARS-UCD1.2 and UOA_Angus_1, respectively. The concordance between sequence- and array-called genotypes was high and the number of variants deviating from Hardy-Weinberg proportions was low at segregating sites for both assemblies. More artefactual INDELs were genotyped from UOA_Angus_1 than ARS-UCD1.2 alignments. Using the composite likelihood ratio test, we detected 40 and 33 signatures of selection from ARS-UCD1.2 and UOA_Angus_1, respectively, but the overlap between both assemblies was low. Using the 161 sequenced Brown Swiss cattle as a reference panel, we imputed sequence variant genotypes into a mapping cohort of 30,499 cattle that had microarray-derived genotypes using a two-step imputation approach. The accuracy of imputation (Beagle R(2)) was very high (0.87) for both assemblies. Genome-wide association studies between imputed sequence variant genotypes and six dairy traits as well as stature produced almost identical results from both assemblies. CONCLUSIONS: The ARS-UCD1.2 and UOA_Angus_1 assemblies are suitable for reference-guided genome analyses in Brown Swiss cattle. Although differences in read mapping and genotyping accuracy between both assemblies are negligible, the choice of the reference genome has a large impact on detecting signatures of selection that already reached fixation using the composite likelihood ratio test. We developed a workflow that can be adapted and reused to compare the impact of reference genomes on genome analyses in various breeds, populations and species. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-07554-w).
format Online
Article
Text
id pubmed-8132449
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81324492021-05-19 Investigating the impact of reference assembly choice on genomic analyses in a cattle breed Lloret-Villas, Audald Bhati, Meenu Kadri, Naveen Kumar Fries, Ruedi Pausch, Hubert BMC Genomics Research Article BACKGROUND: Reference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exceed the current bovine reference genome in quality and continuity have been assembled for different breeds of cattle. Using whole genome sequencing data of 161 Brown Swiss cattle, we compared the accuracy of read mapping and sequence variant genotyping as well as downstream genomic analyses between the bovine reference genome (ARS-UCD1.2) and a highly continuous Angus-based assembly (UOA_Angus_1). RESULTS: Read mapping accuracy did not differ notably between the ARS-UCD1.2 and UOA_Angus_1 assemblies. We discovered 22,744,517 and 22,559,675 high-quality variants from ARS-UCD1.2 and UOA_Angus_1, respectively. The concordance between sequence- and array-called genotypes was high and the number of variants deviating from Hardy-Weinberg proportions was low at segregating sites for both assemblies. More artefactual INDELs were genotyped from UOA_Angus_1 than ARS-UCD1.2 alignments. Using the composite likelihood ratio test, we detected 40 and 33 signatures of selection from ARS-UCD1.2 and UOA_Angus_1, respectively, but the overlap between both assemblies was low. Using the 161 sequenced Brown Swiss cattle as a reference panel, we imputed sequence variant genotypes into a mapping cohort of 30,499 cattle that had microarray-derived genotypes using a two-step imputation approach. The accuracy of imputation (Beagle R(2)) was very high (0.87) for both assemblies. Genome-wide association studies between imputed sequence variant genotypes and six dairy traits as well as stature produced almost identical results from both assemblies. CONCLUSIONS: The ARS-UCD1.2 and UOA_Angus_1 assemblies are suitable for reference-guided genome analyses in Brown Swiss cattle. Although differences in read mapping and genotyping accuracy between both assemblies are negligible, the choice of the reference genome has a large impact on detecting signatures of selection that already reached fixation using the composite likelihood ratio test. We developed a workflow that can be adapted and reused to compare the impact of reference genomes on genome analyses in various breeds, populations and species. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-07554-w). BioMed Central 2021-05-19 /pmc/articles/PMC8132449/ /pubmed/34011274 http://dx.doi.org/10.1186/s12864-021-07554-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Lloret-Villas, Audald
Bhati, Meenu
Kadri, Naveen Kumar
Fries, Ruedi
Pausch, Hubert
Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_full Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_fullStr Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_full_unstemmed Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_short Investigating the impact of reference assembly choice on genomic analyses in a cattle breed
title_sort investigating the impact of reference assembly choice on genomic analyses in a cattle breed
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8132449/
https://www.ncbi.nlm.nih.gov/pubmed/34011274
http://dx.doi.org/10.1186/s12864-021-07554-w
work_keys_str_mv AT lloretvillasaudald investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed
AT bhatimeenu investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed
AT kadrinaveenkumar investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed
AT friesruedi investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed
AT pauschhubert investigatingtheimpactofreferenceassemblychoiceongenomicanalysesinacattlebreed