Cargando…

Using whole genome sequence to compare variant callers and breed differences of US sheep

As whole genome sequence (WGS) data sets have become abundant and widely available, so has the need for variant detection and scoring. The aim of this study was to compare the accuracy of commonly used variant calling programs, Freebayes and GATK HaplotypeCaller (GATK-HC), and to use U.S. sheep WGS...

Descripción completa

Detalles Bibliográficos
Autores principales: Stegemiller, Morgan R., Redden, Reid R., Notter, David R., Taylor, Todd, Taylor, J. Bret, Cockett, Noelle E., Heaton, Michael P., Kalbfleisch, Theodore S., Murdoch, Brenda M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9846548/
https://www.ncbi.nlm.nih.gov/pubmed/36685812
http://dx.doi.org/10.3389/fgene.2022.1060882
_version_ 1784871210127982592
author Stegemiller, Morgan R.
Redden, Reid R.
Notter, David R.
Taylor, Todd
Taylor, J. Bret
Cockett, Noelle E.
Heaton, Michael P.
Kalbfleisch, Theodore S.
Murdoch, Brenda M.
author_facet Stegemiller, Morgan R.
Redden, Reid R.
Notter, David R.
Taylor, Todd
Taylor, J. Bret
Cockett, Noelle E.
Heaton, Michael P.
Kalbfleisch, Theodore S.
Murdoch, Brenda M.
author_sort Stegemiller, Morgan R.
collection PubMed
description As whole genome sequence (WGS) data sets have become abundant and widely available, so has the need for variant detection and scoring. The aim of this study was to compare the accuracy of commonly used variant calling programs, Freebayes and GATK HaplotypeCaller (GATK-HC), and to use U.S. sheep WGS data sets to identify novel breed-associated SNPs. Sequence data from 145 sheep consisting of 14 U.S. breeds were filtered and biallelic single nucleotide polymorphisms (SNPs) were retained for genotyping analyses. Genotypes from both programs were compared to each other and to genotypes from bead arrays. The SNPs from WGS were compared to the bead array data with breed heterozygosity, principal component analysis and identifying breed associated SNPs to analyze genetic diversity. The average sequence read depth was 2.78 reads greater with 6.11% more SNPs being identified in Freebayes compared to GATK-HC. The genotype concordance of the variant callers to bead array data was 96.0% and 95.5% for Freebayes and GATK-HC, respectively. Genotyping with WGS identified 10.5 million SNPs from all 145 sheep. This resulted in an 8% increase in measured heterozygosity and greater breed separation in the principal component analysis compared to the bead array analysis. There were 1,849 SNPs identified in only the Romanov sheep where all 10 rams were homozygous for one allele and the remaining 135 sheep from 13 breeds were homozygous for the opposite allele. Both variant calling programs had greater than 95% concordance of SNPs with bead array data, and either was suitably accurate for ovine WGS data sets. The use of WGS SNPs improved the resolution of PCA analysis and was critical for identifying Romanov breed-associated SNPs. Subsets of such SNPs could be used to estimate germplasm composition in animals without pedigree information.
format Online
Article
Text
id pubmed-9846548
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-98465482023-01-19 Using whole genome sequence to compare variant callers and breed differences of US sheep Stegemiller, Morgan R. Redden, Reid R. Notter, David R. Taylor, Todd Taylor, J. Bret Cockett, Noelle E. Heaton, Michael P. Kalbfleisch, Theodore S. Murdoch, Brenda M. Front Genet Genetics As whole genome sequence (WGS) data sets have become abundant and widely available, so has the need for variant detection and scoring. The aim of this study was to compare the accuracy of commonly used variant calling programs, Freebayes and GATK HaplotypeCaller (GATK-HC), and to use U.S. sheep WGS data sets to identify novel breed-associated SNPs. Sequence data from 145 sheep consisting of 14 U.S. breeds were filtered and biallelic single nucleotide polymorphisms (SNPs) were retained for genotyping analyses. Genotypes from both programs were compared to each other and to genotypes from bead arrays. The SNPs from WGS were compared to the bead array data with breed heterozygosity, principal component analysis and identifying breed associated SNPs to analyze genetic diversity. The average sequence read depth was 2.78 reads greater with 6.11% more SNPs being identified in Freebayes compared to GATK-HC. The genotype concordance of the variant callers to bead array data was 96.0% and 95.5% for Freebayes and GATK-HC, respectively. Genotyping with WGS identified 10.5 million SNPs from all 145 sheep. This resulted in an 8% increase in measured heterozygosity and greater breed separation in the principal component analysis compared to the bead array analysis. There were 1,849 SNPs identified in only the Romanov sheep where all 10 rams were homozygous for one allele and the remaining 135 sheep from 13 breeds were homozygous for the opposite allele. Both variant calling programs had greater than 95% concordance of SNPs with bead array data, and either was suitably accurate for ovine WGS data sets. The use of WGS SNPs improved the resolution of PCA analysis and was critical for identifying Romanov breed-associated SNPs. Subsets of such SNPs could be used to estimate germplasm composition in animals without pedigree information. Frontiers Media S.A. 2023-01-04 /pmc/articles/PMC9846548/ /pubmed/36685812 http://dx.doi.org/10.3389/fgene.2022.1060882 Text en Copyright © 2023 Stegemiller, Redden, Notter, Taylor, Taylor, Cockett, Heaton, Kalbfleisch and Murdoch. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Stegemiller, Morgan R.
Redden, Reid R.
Notter, David R.
Taylor, Todd
Taylor, J. Bret
Cockett, Noelle E.
Heaton, Michael P.
Kalbfleisch, Theodore S.
Murdoch, Brenda M.
Using whole genome sequence to compare variant callers and breed differences of US sheep
title Using whole genome sequence to compare variant callers and breed differences of US sheep
title_full Using whole genome sequence to compare variant callers and breed differences of US sheep
title_fullStr Using whole genome sequence to compare variant callers and breed differences of US sheep
title_full_unstemmed Using whole genome sequence to compare variant callers and breed differences of US sheep
title_short Using whole genome sequence to compare variant callers and breed differences of US sheep
title_sort using whole genome sequence to compare variant callers and breed differences of us sheep
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9846548/
https://www.ncbi.nlm.nih.gov/pubmed/36685812
http://dx.doi.org/10.3389/fgene.2022.1060882
work_keys_str_mv AT stegemillermorganr usingwholegenomesequencetocomparevariantcallersandbreeddifferencesofussheep
AT reddenreidr usingwholegenomesequencetocomparevariantcallersandbreeddifferencesofussheep
AT notterdavidr usingwholegenomesequencetocomparevariantcallersandbreeddifferencesofussheep
AT taylortodd usingwholegenomesequencetocomparevariantcallersandbreeddifferencesofussheep
AT taylorjbret usingwholegenomesequencetocomparevariantcallersandbreeddifferencesofussheep
AT cockettnoellee usingwholegenomesequencetocomparevariantcallersandbreeddifferencesofussheep
AT heatonmichaelp usingwholegenomesequencetocomparevariantcallersandbreeddifferencesofussheep
AT kalbfleischtheodores usingwholegenomesequencetocomparevariantcallersandbreeddifferencesofussheep
AT murdochbrendam usingwholegenomesequencetocomparevariantcallersandbreeddifferencesofussheep