Cargando…
sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs
BACKGROUND: Simulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6332552/ https://www.ncbi.nlm.nih.gov/pubmed/30646839 http://dx.doi.org/10.1186/s12859-019-2611-1 |
_version_ | 1783387376881500160 |
---|---|
author | Dimitromanolakis, Apostolos Xu, Jingxiong Krol, Agnieszka Briollais, Laurent |
author_facet | Dimitromanolakis, Apostolos Xu, Jingxiong Krol, Agnieszka Briollais, Laurent |
author_sort | Dimitromanolakis, Apostolos |
collection | PubMed |
description | BACKGROUND: Simulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in computation to be used effectively. In addition, generating simulated data in the context of family-based studies demands sophisticated methods and advanced computer programming. RESULTS: To address these issues, we propose a new user-friendly and integrated R package, sim1000G, which simulates variants in genomic regions among unrelated individuals or among families. The only input needed is a raw phased Variant Call Format (VCF) file. Haplotypes are extracted to compute linkage disequilibrium (LD) in the simulated genomic regions and for the generation of new genotype data among unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Pedigrees of arbitrary sizes are generated by modeling recombination events with sim1000G. To illustrate the application of sim1000G, various scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation pedigree data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need of any tuning parameters. CONCLUSION: Sim1000G fills a gap in the vast area of genetic variants simulators by its simplicity and independence from external tools. Currently, it is one of the few simulation packages completely integrated into R and able to simulate multiple genetic variants among unrelated individuals and within families. Its implementation will facilitate the application and development of computational methods for association studies with both rare and common variants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2611-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6332552 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63325522019-01-16 sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs Dimitromanolakis, Apostolos Xu, Jingxiong Krol, Agnieszka Briollais, Laurent BMC Bioinformatics Software BACKGROUND: Simulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in computation to be used effectively. In addition, generating simulated data in the context of family-based studies demands sophisticated methods and advanced computer programming. RESULTS: To address these issues, we propose a new user-friendly and integrated R package, sim1000G, which simulates variants in genomic regions among unrelated individuals or among families. The only input needed is a raw phased Variant Call Format (VCF) file. Haplotypes are extracted to compute linkage disequilibrium (LD) in the simulated genomic regions and for the generation of new genotype data among unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Pedigrees of arbitrary sizes are generated by modeling recombination events with sim1000G. To illustrate the application of sim1000G, various scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation pedigree data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need of any tuning parameters. CONCLUSION: Sim1000G fills a gap in the vast area of genetic variants simulators by its simplicity and independence from external tools. Currently, it is one of the few simulation packages completely integrated into R and able to simulate multiple genetic variants among unrelated individuals and within families. Its implementation will facilitate the application and development of computational methods for association studies with both rare and common variants. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2611-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-15 /pmc/articles/PMC6332552/ /pubmed/30646839 http://dx.doi.org/10.1186/s12859-019-2611-1 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Dimitromanolakis, Apostolos Xu, Jingxiong Krol, Agnieszka Briollais, Laurent sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs |
title | sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs |
title_full | sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs |
title_fullStr | sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs |
title_full_unstemmed | sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs |
title_short | sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs |
title_sort | sim1000g: a user-friendly genetic variant simulator in r for unrelated individuals and family-based designs |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6332552/ https://www.ncbi.nlm.nih.gov/pubmed/30646839 http://dx.doi.org/10.1186/s12859-019-2611-1 |
work_keys_str_mv | AT dimitromanolakisapostolos sim1000gauserfriendlygeneticvariantsimulatorinrforunrelatedindividualsandfamilybaseddesigns AT xujingxiong sim1000gauserfriendlygeneticvariantsimulatorinrforunrelatedindividualsandfamilybaseddesigns AT krolagnieszka sim1000gauserfriendlygeneticvariantsimulatorinrforunrelatedindividualsandfamilybaseddesigns AT briollaislaurent sim1000gauserfriendlygeneticvariantsimulatorinrforunrelatedindividualsandfamilybaseddesigns |