Cargando…

Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data

BACKGROUND: Runs of Homozygosity (ROH) are genomic regions where identical haplotypes are inherited from each parent. Since their first detection due to technological advances in the late 1990s, ROHs have been shedding light on human population history and deciphering the genetic basis of monogenic...

Descripción completa

Detalles Bibliográficos
Autores principales: Ceballos, Francisco C., Hazelhurst, Scott, Ramsay, Michèle
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5789638/
https://www.ncbi.nlm.nih.gov/pubmed/29378520
http://dx.doi.org/10.1186/s12864-018-4489-0
_version_ 1783296321893957632
author Ceballos, Francisco C.
Hazelhurst, Scott
Ramsay, Michèle
author_facet Ceballos, Francisco C.
Hazelhurst, Scott
Ramsay, Michèle
author_sort Ceballos, Francisco C.
collection PubMed
description BACKGROUND: Runs of Homozygosity (ROH) are genomic regions where identical haplotypes are inherited from each parent. Since their first detection due to technological advances in the late 1990s, ROHs have been shedding light on human population history and deciphering the genetic basis of monogenic and complex traits and diseases. ROH studies have predominantly exploited SNP array data, but are gradually moving to whole genome sequence (WGS) data as it becomes available. WGS data, covering more genetic variability, can add value to ROH studies, but require additional considerations during analysis. RESULTS: Using SNP array and low coverage WGS data from 1885 individuals from 20 world populations, our aims were to compare ROH from the two datasets and to establish software conditions to get comparable results, thus providing guidelines for combining disparate datasets in joint ROH analyses. By allowing heterozygous SNPs per window, using the PLINK homozygosity function and non-parametric analysis, we were able to obtain non-significant differences in number ROH, mean ROH size and total sum of ROH between data sets using the different technologies for almost all populations. CONCLUSIONS: By allowing 3 heterozygous SNPs per ROH when dealing with WGS low coverage data, it is possible to establish meaningful comparisons between data using SNP array and WGS low coverage technologies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4489-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5789638
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57896382018-02-08 Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data Ceballos, Francisco C. Hazelhurst, Scott Ramsay, Michèle BMC Genomics Methodology Article BACKGROUND: Runs of Homozygosity (ROH) are genomic regions where identical haplotypes are inherited from each parent. Since their first detection due to technological advances in the late 1990s, ROHs have been shedding light on human population history and deciphering the genetic basis of monogenic and complex traits and diseases. ROH studies have predominantly exploited SNP array data, but are gradually moving to whole genome sequence (WGS) data as it becomes available. WGS data, covering more genetic variability, can add value to ROH studies, but require additional considerations during analysis. RESULTS: Using SNP array and low coverage WGS data from 1885 individuals from 20 world populations, our aims were to compare ROH from the two datasets and to establish software conditions to get comparable results, thus providing guidelines for combining disparate datasets in joint ROH analyses. By allowing heterozygous SNPs per window, using the PLINK homozygosity function and non-parametric analysis, we were able to obtain non-significant differences in number ROH, mean ROH size and total sum of ROH between data sets using the different technologies for almost all populations. CONCLUSIONS: By allowing 3 heterozygous SNPs per ROH when dealing with WGS low coverage data, it is possible to establish meaningful comparisons between data using SNP array and WGS low coverage technologies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4489-0) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-30 /pmc/articles/PMC5789638/ /pubmed/29378520 http://dx.doi.org/10.1186/s12864-018-4489-0 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Ceballos, Francisco C.
Hazelhurst, Scott
Ramsay, Michèle
Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data
title Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data
title_full Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data
title_fullStr Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data
title_full_unstemmed Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data
title_short Assessing runs of Homozygosity: a comparison of SNP Array and whole genome sequence low coverage data
title_sort assessing runs of homozygosity: a comparison of snp array and whole genome sequence low coverage data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5789638/
https://www.ncbi.nlm.nih.gov/pubmed/29378520
http://dx.doi.org/10.1186/s12864-018-4489-0
work_keys_str_mv AT ceballosfranciscoc assessingrunsofhomozygosityacomparisonofsnparrayandwholegenomesequencelowcoveragedata
AT hazelhurstscott assessingrunsofhomozygosityacomparisonofsnparrayandwholegenomesequencelowcoveragedata
AT ramsaymichele assessingrunsofhomozygosityacomparisonofsnparrayandwholegenomesequencelowcoveragedata