Cargando…

GENLIB: an R package for the analysis of genealogical data

BACKGROUND: Founder populations have an important role in the study of genetic diseases. Access to detailed genealogical records is often one of their advantages. These genealogical data provide unique information for researchers in evolutionary and population genetics, demography and genetic epidem...

Descripción completa

Detalles Bibliográficos
Autores principales: Gauvin, Héloïse, Lefebvre, Jean-François, Moreau, Claudia, Lavoie, Eve-Marie, Labuda, Damian, Vézina, Hélène, Roy-Gagnon, Marie-Hélène
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431039/
https://www.ncbi.nlm.nih.gov/pubmed/25971991
http://dx.doi.org/10.1186/s12859-015-0581-5
_version_ 1782371276137955328
author Gauvin, Héloïse
Lefebvre, Jean-François
Moreau, Claudia
Lavoie, Eve-Marie
Labuda, Damian
Vézina, Hélène
Roy-Gagnon, Marie-Hélène
author_facet Gauvin, Héloïse
Lefebvre, Jean-François
Moreau, Claudia
Lavoie, Eve-Marie
Labuda, Damian
Vézina, Hélène
Roy-Gagnon, Marie-Hélène
author_sort Gauvin, Héloïse
collection PubMed
description BACKGROUND: Founder populations have an important role in the study of genetic diseases. Access to detailed genealogical records is often one of their advantages. These genealogical data provide unique information for researchers in evolutionary and population genetics, demography and genetic epidemiology. However, analyzing large genealogical datasets requires specialized methods and software. The GENLIB software was developed to study the large genealogies of the French Canadian population of Quebec, Canada. These genealogies are accessible through the BALSAC database, which contains over 3 million records covering the whole province of Quebec over four centuries. Using this resource, extended pedigrees of up to 17 generations can be constructed from a sample of present-day individuals. RESULTS: We have extended and implemented GENLIB as a package in the R environment for statistical computing and graphics, thus allowing optimal flexibility for users. The GENLIB package includes basic functions to manage genealogical data allowing, for example, extraction of a part of a genealogy or selection of specific individuals. There are also many functions providing information to describe the size and complexity of genealogies as well as functions to compute standard measures such as kinship, inbreeding and genetic contribution. GENLIB also includes functions for gene-dropping simulations. The goal of this paper is to present the full functionalities of GENLIB. We used a sample of 140 individuals from the province of Quebec (Canada) to demonstrate GENLIB’s functions. Ascending genealogies for these individuals were reconstructed using BALSAC, yielding a large pedigree of 41,523 individuals. Using GENLIB’s functions, we provide a detailed description of these genealogical data in terms of completeness, genetic contribution of founders, relatedness, inbreeding and the overall complexity of the genealogical tree. We also present gene-dropping simulations based on the whole genealogy to investigate identical-by-descent sharing of alleles and chromosomal segments of different lengths and estimate probabilities of identical-by-descent sharing. CONCLUSIONS: The R package GENLIB provides a user friendly and flexible environment to analyze extensive genealogical data, allowing an efficient and easy integration of different types of data, analytical methods and additional developments and making this tool ideal for genealogical analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0581-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4431039
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44310392015-05-15 GENLIB: an R package for the analysis of genealogical data Gauvin, Héloïse Lefebvre, Jean-François Moreau, Claudia Lavoie, Eve-Marie Labuda, Damian Vézina, Hélène Roy-Gagnon, Marie-Hélène BMC Bioinformatics Software BACKGROUND: Founder populations have an important role in the study of genetic diseases. Access to detailed genealogical records is often one of their advantages. These genealogical data provide unique information for researchers in evolutionary and population genetics, demography and genetic epidemiology. However, analyzing large genealogical datasets requires specialized methods and software. The GENLIB software was developed to study the large genealogies of the French Canadian population of Quebec, Canada. These genealogies are accessible through the BALSAC database, which contains over 3 million records covering the whole province of Quebec over four centuries. Using this resource, extended pedigrees of up to 17 generations can be constructed from a sample of present-day individuals. RESULTS: We have extended and implemented GENLIB as a package in the R environment for statistical computing and graphics, thus allowing optimal flexibility for users. The GENLIB package includes basic functions to manage genealogical data allowing, for example, extraction of a part of a genealogy or selection of specific individuals. There are also many functions providing information to describe the size and complexity of genealogies as well as functions to compute standard measures such as kinship, inbreeding and genetic contribution. GENLIB also includes functions for gene-dropping simulations. The goal of this paper is to present the full functionalities of GENLIB. We used a sample of 140 individuals from the province of Quebec (Canada) to demonstrate GENLIB’s functions. Ascending genealogies for these individuals were reconstructed using BALSAC, yielding a large pedigree of 41,523 individuals. Using GENLIB’s functions, we provide a detailed description of these genealogical data in terms of completeness, genetic contribution of founders, relatedness, inbreeding and the overall complexity of the genealogical tree. We also present gene-dropping simulations based on the whole genealogy to investigate identical-by-descent sharing of alleles and chromosomal segments of different lengths and estimate probabilities of identical-by-descent sharing. CONCLUSIONS: The R package GENLIB provides a user friendly and flexible environment to analyze extensive genealogical data, allowing an efficient and easy integration of different types of data, analytical methods and additional developments and making this tool ideal for genealogical analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0581-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-15 /pmc/articles/PMC4431039/ /pubmed/25971991 http://dx.doi.org/10.1186/s12859-015-0581-5 Text en © Gauvin et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Gauvin, Héloïse
Lefebvre, Jean-François
Moreau, Claudia
Lavoie, Eve-Marie
Labuda, Damian
Vézina, Hélène
Roy-Gagnon, Marie-Hélène
GENLIB: an R package for the analysis of genealogical data
title GENLIB: an R package for the analysis of genealogical data
title_full GENLIB: an R package for the analysis of genealogical data
title_fullStr GENLIB: an R package for the analysis of genealogical data
title_full_unstemmed GENLIB: an R package for the analysis of genealogical data
title_short GENLIB: an R package for the analysis of genealogical data
title_sort genlib: an r package for the analysis of genealogical data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4431039/
https://www.ncbi.nlm.nih.gov/pubmed/25971991
http://dx.doi.org/10.1186/s12859-015-0581-5
work_keys_str_mv AT gauvinheloise genlibanrpackagefortheanalysisofgenealogicaldata
AT lefebvrejeanfrancois genlibanrpackagefortheanalysisofgenealogicaldata
AT moreauclaudia genlibanrpackagefortheanalysisofgenealogicaldata
AT lavoieevemarie genlibanrpackagefortheanalysisofgenealogicaldata
AT labudadamian genlibanrpackagefortheanalysisofgenealogicaldata
AT vezinahelene genlibanrpackagefortheanalysisofgenealogicaldata
AT roygagnonmariehelene genlibanrpackagefortheanalysisofgenealogicaldata