Cargando…

Detection of identity by descent using next-generation whole genome sequencing data

BACKGROUND: Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic a...

Descripción completa

Detalles Bibliográficos
Autores principales: Su, Shu-Yi, Kasberger, Jay, Baranzini, Sergio, Byerley, William, Liao, Wilson, Oksenberg, Jorge, Sherr, Elliott, Jorgenson, Eric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3403908/
https://www.ncbi.nlm.nih.gov/pubmed/22672699
http://dx.doi.org/10.1186/1471-2105-13-121
_version_ 1782238942784913408
author Su, Shu-Yi
Kasberger, Jay
Baranzini, Sergio
Byerley, William
Liao, Wilson
Oksenberg, Jorge
Sherr, Elliott
Jorgenson, Eric
author_facet Su, Shu-Yi
Kasberger, Jay
Baranzini, Sergio
Byerley, William
Liao, Wilson
Oksenberg, Jorge
Sherr, Elliott
Jorgenson, Eric
author_sort Su, Shu-Yi
collection PubMed
description BACKGROUND: Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the soft- ware programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, in- cluding identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data. RESULTS: Here, we investigate how different levels of variant coverage in sequencing and microarray genotype data influences the resolution at which IBD can be detected. This includes microarray genotype data from the WTCCC study, denser genotype data from the HapMap Project, low coverage sequencing data from the 1000 Genomes Project, and deep coverage complete genome data from our own projects. With high power (78%), we can detect segments of length 0.4 cM or larger using fastIBD and GERMLINE in sequencing data. This compares to similar power to detect segments of length 1.0 cM or higher with microarray genotype data. We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate. CONCLUSION: We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments. These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies.
format Online
Article
Text
id pubmed-3403908
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34039082012-07-25 Detection of identity by descent using next-generation whole genome sequencing data Su, Shu-Yi Kasberger, Jay Baranzini, Sergio Byerley, William Liao, Wilson Oksenberg, Jorge Sherr, Elliott Jorgenson, Eric BMC Bioinformatics Research Article BACKGROUND: Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the soft- ware programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, in- cluding identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data. RESULTS: Here, we investigate how different levels of variant coverage in sequencing and microarray genotype data influences the resolution at which IBD can be detected. This includes microarray genotype data from the WTCCC study, denser genotype data from the HapMap Project, low coverage sequencing data from the 1000 Genomes Project, and deep coverage complete genome data from our own projects. With high power (78%), we can detect segments of length 0.4 cM or larger using fastIBD and GERMLINE in sequencing data. This compares to similar power to detect segments of length 1.0 cM or higher with microarray genotype data. We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate. CONCLUSION: We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments. These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies. BioMed Central 2012-06-06 /pmc/articles/PMC3403908/ /pubmed/22672699 http://dx.doi.org/10.1186/1471-2105-13-121 Text en Copyright ©2012 Su et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Su, Shu-Yi
Kasberger, Jay
Baranzini, Sergio
Byerley, William
Liao, Wilson
Oksenberg, Jorge
Sherr, Elliott
Jorgenson, Eric
Detection of identity by descent using next-generation whole genome sequencing data
title Detection of identity by descent using next-generation whole genome sequencing data
title_full Detection of identity by descent using next-generation whole genome sequencing data
title_fullStr Detection of identity by descent using next-generation whole genome sequencing data
title_full_unstemmed Detection of identity by descent using next-generation whole genome sequencing data
title_short Detection of identity by descent using next-generation whole genome sequencing data
title_sort detection of identity by descent using next-generation whole genome sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3403908/
https://www.ncbi.nlm.nih.gov/pubmed/22672699
http://dx.doi.org/10.1186/1471-2105-13-121
work_keys_str_mv AT sushuyi detectionofidentitybydescentusingnextgenerationwholegenomesequencingdata
AT kasbergerjay detectionofidentitybydescentusingnextgenerationwholegenomesequencingdata
AT baranzinisergio detectionofidentitybydescentusingnextgenerationwholegenomesequencingdata
AT byerleywilliam detectionofidentitybydescentusingnextgenerationwholegenomesequencingdata
AT liaowilson detectionofidentitybydescentusingnextgenerationwholegenomesequencingdata
AT oksenbergjorge detectionofidentitybydescentusingnextgenerationwholegenomesequencingdata
AT sherrelliott detectionofidentitybydescentusingnextgenerationwholegenomesequencingdata
AT jorgensoneric detectionofidentitybydescentusingnextgenerationwholegenomesequencingdata