Cargando…
Construction of relatedness matrices using genotyping-by-sequencing data
BACKGROUND: Genotyping-by-sequencing (GBS) is becoming an attractive alternative to array-based methods for genotyping individuals for a large number of single nucleotide polymorphisms (SNPs). Costs can be lowered by reducing the mean sequencing depth, but this results in genotype calls of lower qua...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4675043/ https://www.ncbi.nlm.nih.gov/pubmed/26654230 http://dx.doi.org/10.1186/s12864-015-2252-3 |
_version_ | 1782405002654908416 |
---|---|
author | Dodds, Ken G. McEwan, John C. Brauning, Rudiger Anderson, Rayna M. van Stijn, Tracey C. Kristjánsson, Theodor Clarke, Shannon M. |
author_facet | Dodds, Ken G. McEwan, John C. Brauning, Rudiger Anderson, Rayna M. van Stijn, Tracey C. Kristjánsson, Theodor Clarke, Shannon M. |
author_sort | Dodds, Ken G. |
collection | PubMed |
description | BACKGROUND: Genotyping-by-sequencing (GBS) is becoming an attractive alternative to array-based methods for genotyping individuals for a large number of single nucleotide polymorphisms (SNPs). Costs can be lowered by reducing the mean sequencing depth, but this results in genotype calls of lower quality. A common analysis strategy is to filter SNPs to just those with sufficient depth, thereby greatly reducing the number of SNPs available. We investigate methods for estimating relatedness using GBS data, including results of low depth, using theoretical calculation, simulation and application to a real data set. RESULTS: We show that unbiased estimates of relatedness can be obtained by using only those SNPs with genotype calls in both individuals. The expected value of this estimator is independent of the SNP depth in each individual, under a model of genotype calling that includes the special case of the two alleles being read at random. In contrast, the estimator of self-relatedness does depend on the SNP depth, and we provide a modification to provide unbiased estimates of self-relatedness. We refer to these methods of estimation as kinship using GBS with depth adjustment (KGD). The estimators can be calculated using matrix methods, which allow efficient computation. Simulation results were consistent with the methods being unbiased, and suggest that the optimal sequencing depth is around 2–4 for relatedness between individuals and 5–10 for self-relatedness. Application to a real data set revealed that some SNP filtering may still be necessary, for the exclusion of SNPs which did not behave in a Mendelian fashion. A simple graphical method (a ‘fin plot’) is given to illustrate this issue and to guide filtering parameters. CONCLUSION: We provide a method which gives unbiased estimates of relatedness, based on SNPs assayed by GBS, which accounts for the depth (including zero depth) of the genotype calls. This allows GBS to be applied at read depths which can be chosen to optimise the information obtained. SNPs with excess heterozygosity, often due to (partial) polyploidy or other duplications can be filtered based on a simple graphical method. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2252-3) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4675043 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46750432015-12-11 Construction of relatedness matrices using genotyping-by-sequencing data Dodds, Ken G. McEwan, John C. Brauning, Rudiger Anderson, Rayna M. van Stijn, Tracey C. Kristjánsson, Theodor Clarke, Shannon M. BMC Genomics Methodology Article BACKGROUND: Genotyping-by-sequencing (GBS) is becoming an attractive alternative to array-based methods for genotyping individuals for a large number of single nucleotide polymorphisms (SNPs). Costs can be lowered by reducing the mean sequencing depth, but this results in genotype calls of lower quality. A common analysis strategy is to filter SNPs to just those with sufficient depth, thereby greatly reducing the number of SNPs available. We investigate methods for estimating relatedness using GBS data, including results of low depth, using theoretical calculation, simulation and application to a real data set. RESULTS: We show that unbiased estimates of relatedness can be obtained by using only those SNPs with genotype calls in both individuals. The expected value of this estimator is independent of the SNP depth in each individual, under a model of genotype calling that includes the special case of the two alleles being read at random. In contrast, the estimator of self-relatedness does depend on the SNP depth, and we provide a modification to provide unbiased estimates of self-relatedness. We refer to these methods of estimation as kinship using GBS with depth adjustment (KGD). The estimators can be calculated using matrix methods, which allow efficient computation. Simulation results were consistent with the methods being unbiased, and suggest that the optimal sequencing depth is around 2–4 for relatedness between individuals and 5–10 for self-relatedness. Application to a real data set revealed that some SNP filtering may still be necessary, for the exclusion of SNPs which did not behave in a Mendelian fashion. A simple graphical method (a ‘fin plot’) is given to illustrate this issue and to guide filtering parameters. CONCLUSION: We provide a method which gives unbiased estimates of relatedness, based on SNPs assayed by GBS, which accounts for the depth (including zero depth) of the genotype calls. This allows GBS to be applied at read depths which can be chosen to optimise the information obtained. SNPs with excess heterozygosity, often due to (partial) polyploidy or other duplications can be filtered based on a simple graphical method. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2252-3) contains supplementary material, which is available to authorized users. BioMed Central 2015-12-09 /pmc/articles/PMC4675043/ /pubmed/26654230 http://dx.doi.org/10.1186/s12864-015-2252-3 Text en © Dodds et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Dodds, Ken G. McEwan, John C. Brauning, Rudiger Anderson, Rayna M. van Stijn, Tracey C. Kristjánsson, Theodor Clarke, Shannon M. Construction of relatedness matrices using genotyping-by-sequencing data |
title | Construction of relatedness matrices using genotyping-by-sequencing data |
title_full | Construction of relatedness matrices using genotyping-by-sequencing data |
title_fullStr | Construction of relatedness matrices using genotyping-by-sequencing data |
title_full_unstemmed | Construction of relatedness matrices using genotyping-by-sequencing data |
title_short | Construction of relatedness matrices using genotyping-by-sequencing data |
title_sort | construction of relatedness matrices using genotyping-by-sequencing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4675043/ https://www.ncbi.nlm.nih.gov/pubmed/26654230 http://dx.doi.org/10.1186/s12864-015-2252-3 |
work_keys_str_mv | AT doddskeng constructionofrelatednessmatricesusinggenotypingbysequencingdata AT mcewanjohnc constructionofrelatednessmatricesusinggenotypingbysequencingdata AT brauningrudiger constructionofrelatednessmatricesusinggenotypingbysequencingdata AT andersonraynam constructionofrelatednessmatricesusinggenotypingbysequencingdata AT vanstijntraceyc constructionofrelatednessmatricesusinggenotypingbysequencingdata AT kristjanssontheodor constructionofrelatednessmatricesusinggenotypingbysequencingdata AT clarkeshannonm constructionofrelatednessmatricesusinggenotypingbysequencingdata |