Cargando…

Estimating variance components in population scale family trees

The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals and trillions of pairs of relatives. Such pedigrees provide the opportunity to investigate the sociological and epidemiological history of human popula...

Descripción completa

Detalles Bibliográficos
Autores principales: Shor, Tal, Kalka, Iris, Geiger, Dan, Erlich, Yaniv, Weissbrod, Omer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6529016/
https://www.ncbi.nlm.nih.gov/pubmed/31071088
http://dx.doi.org/10.1371/journal.pgen.1008124
_version_ 1783420323406807040
author Shor, Tal
Kalka, Iris
Geiger, Dan
Erlich, Yaniv
Weissbrod, Omer
author_facet Shor, Tal
Kalka, Iris
Geiger, Dan
Erlich, Yaniv
Weissbrod, Omer
author_sort Shor, Tal
collection PubMed
description The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals and trillions of pairs of relatives. Such pedigrees provide the opportunity to investigate the sociological and epidemiological history of human populations in scales much larger than previously possible. Linear mixed models (LMMs) are routinely used to analyze extremely large animal and plant pedigrees for the purposes of selective breeding. However, LMMs have not been previously applied to analyze population-scale human family trees. Here, we present Sparse Cholesky factorIzation LMM (Sci-LMM), a modeling framework for studying population-scale family trees that combines techniques from the animal and plant breeding literature and from human genetics literature. The proposed framework can construct a matrix of relationships between trillions of pairs of individuals and fit the corresponding LMM in several hours. We demonstrate the capabilities of Sci-LMM via simulation studies and by estimating the heritability of longevity and of reproductive fitness (quantified via number of children) in a large pedigree spanning millions of individuals and over five centuries of human history. Sci-LMM provides a unified framework for investigating the epidemiological history of human populations via genealogical records.
format Online
Article
Text
id pubmed-6529016
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65290162019-05-31 Estimating variance components in population scale family trees Shor, Tal Kalka, Iris Geiger, Dan Erlich, Yaniv Weissbrod, Omer PLoS Genet Research Article The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals and trillions of pairs of relatives. Such pedigrees provide the opportunity to investigate the sociological and epidemiological history of human populations in scales much larger than previously possible. Linear mixed models (LMMs) are routinely used to analyze extremely large animal and plant pedigrees for the purposes of selective breeding. However, LMMs have not been previously applied to analyze population-scale human family trees. Here, we present Sparse Cholesky factorIzation LMM (Sci-LMM), a modeling framework for studying population-scale family trees that combines techniques from the animal and plant breeding literature and from human genetics literature. The proposed framework can construct a matrix of relationships between trillions of pairs of individuals and fit the corresponding LMM in several hours. We demonstrate the capabilities of Sci-LMM via simulation studies and by estimating the heritability of longevity and of reproductive fitness (quantified via number of children) in a large pedigree spanning millions of individuals and over five centuries of human history. Sci-LMM provides a unified framework for investigating the epidemiological history of human populations via genealogical records. Public Library of Science 2019-05-09 /pmc/articles/PMC6529016/ /pubmed/31071088 http://dx.doi.org/10.1371/journal.pgen.1008124 Text en © 2019 Shor et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Shor, Tal
Kalka, Iris
Geiger, Dan
Erlich, Yaniv
Weissbrod, Omer
Estimating variance components in population scale family trees
title Estimating variance components in population scale family trees
title_full Estimating variance components in population scale family trees
title_fullStr Estimating variance components in population scale family trees
title_full_unstemmed Estimating variance components in population scale family trees
title_short Estimating variance components in population scale family trees
title_sort estimating variance components in population scale family trees
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6529016/
https://www.ncbi.nlm.nih.gov/pubmed/31071088
http://dx.doi.org/10.1371/journal.pgen.1008124
work_keys_str_mv AT shortal estimatingvariancecomponentsinpopulationscalefamilytrees
AT kalkairis estimatingvariancecomponentsinpopulationscalefamilytrees
AT geigerdan estimatingvariancecomponentsinpopulationscalefamilytrees
AT erlichyaniv estimatingvariancecomponentsinpopulationscalefamilytrees
AT weissbrodomer estimatingvariancecomponentsinpopulationscalefamilytrees