Cargando…

Relationship Estimation from Whole-Genome Sequence Data

The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Hong, Glusman, Gustavo, Hu, Hao, Shankaracharya, Caballero, Juan, Hubley, Robert, Witherspoon, David, Guthery, Stephen L., Mauldin, Denise E., Jorde, Lynn B., Hood, Leroy, Roach, Jared C., Huff, Chad D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907355/
https://www.ncbi.nlm.nih.gov/pubmed/24497848
http://dx.doi.org/10.1371/journal.pgen.1004144
_version_ 1782301587271581696
author Li, Hong
Glusman, Gustavo
Hu, Hao
Shankaracharya,
Caballero, Juan
Hubley, Robert
Witherspoon, David
Guthery, Stephen L.
Mauldin, Denise E.
Jorde, Lynn B.
Hood, Leroy
Roach, Jared C.
Huff, Chad D.
author_facet Li, Hong
Glusman, Gustavo
Hu, Hao
Shankaracharya,
Caballero, Juan
Hubley, Robert
Witherspoon, David
Guthery, Stephen L.
Mauldin, Denise E.
Jorde, Lynn B.
Hood, Leroy
Roach, Jared C.
Huff, Chad D.
author_sort Li, Hong
collection PubMed
description The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1(st) through 6(th) degree relationships, and 55% of 9(th) through 11(th) degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1(st) through 9(th) degree relationships from whole-genome sequence data.
format Online
Article
Text
id pubmed-3907355
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39073552014-02-04 Relationship Estimation from Whole-Genome Sequence Data Li, Hong Glusman, Gustavo Hu, Hao Shankaracharya, Caballero, Juan Hubley, Robert Witherspoon, David Guthery, Stephen L. Mauldin, Denise E. Jorde, Lynn B. Hood, Leroy Roach, Jared C. Huff, Chad D. PLoS Genet Research Article The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1(st) through 6(th) degree relationships, and 55% of 9(th) through 11(th) degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1(st) through 9(th) degree relationships from whole-genome sequence data. Public Library of Science 2014-01-30 /pmc/articles/PMC3907355/ /pubmed/24497848 http://dx.doi.org/10.1371/journal.pgen.1004144 Text en © 2014 Li et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Li, Hong
Glusman, Gustavo
Hu, Hao
Shankaracharya,
Caballero, Juan
Hubley, Robert
Witherspoon, David
Guthery, Stephen L.
Mauldin, Denise E.
Jorde, Lynn B.
Hood, Leroy
Roach, Jared C.
Huff, Chad D.
Relationship Estimation from Whole-Genome Sequence Data
title Relationship Estimation from Whole-Genome Sequence Data
title_full Relationship Estimation from Whole-Genome Sequence Data
title_fullStr Relationship Estimation from Whole-Genome Sequence Data
title_full_unstemmed Relationship Estimation from Whole-Genome Sequence Data
title_short Relationship Estimation from Whole-Genome Sequence Data
title_sort relationship estimation from whole-genome sequence data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907355/
https://www.ncbi.nlm.nih.gov/pubmed/24497848
http://dx.doi.org/10.1371/journal.pgen.1004144
work_keys_str_mv AT lihong relationshipestimationfromwholegenomesequencedata
AT glusmangustavo relationshipestimationfromwholegenomesequencedata
AT huhao relationshipestimationfromwholegenomesequencedata
AT shankaracharya relationshipestimationfromwholegenomesequencedata
AT caballerojuan relationshipestimationfromwholegenomesequencedata
AT hubleyrobert relationshipestimationfromwholegenomesequencedata
AT witherspoondavid relationshipestimationfromwholegenomesequencedata
AT gutherystephenl relationshipestimationfromwholegenomesequencedata
AT mauldindenisee relationshipestimationfromwholegenomesequencedata
AT jordelynnb relationshipestimationfromwholegenomesequencedata
AT hoodleroy relationshipestimationfromwholegenomesequencedata
AT roachjaredc relationshipestimationfromwholegenomesequencedata
AT huffchadd relationshipestimationfromwholegenomesequencedata