Cargando…
Modern simulation utilities for genetic analysis
BACKGROUND: Statistical geneticists employ simulation to estimate the power of proposed studies, test new analysis tools, and evaluate properties of causal models. Although there are existing trait simulators, there is ample room for modernization. For example, most phenotype simulators are limited...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8091532/ https://www.ncbi.nlm.nih.gov/pubmed/33941078 http://dx.doi.org/10.1186/s12859-021-04086-8 |
_version_ | 1783687502368866304 |
---|---|
author | Ji, Sarah S. German, Christopher A. Lange, Kenneth Sinsheimer, Janet S. Zhou, Hua Zhou, Jin Sobel, Eric M. |
author_facet | Ji, Sarah S. German, Christopher A. Lange, Kenneth Sinsheimer, Janet S. Zhou, Hua Zhou, Jin Sobel, Eric M. |
author_sort | Ji, Sarah S. |
collection | PubMed |
description | BACKGROUND: Statistical geneticists employ simulation to estimate the power of proposed studies, test new analysis tools, and evaluate properties of causal models. Although there are existing trait simulators, there is ample room for modernization. For example, most phenotype simulators are limited to Gaussian traits or traits transformable to normality, while ignoring qualitative traits and realistic, non-normal trait distributions. Also, modern computer languages, such as Julia, that accommodate parallelization and cloud-based computing are now mainstream but rarely used in older applications. To meet the challenges of contemporary big studies, it is important for geneticists to adopt new computational tools. RESULTS: We present TraitSimulation, an open-source Julia package that makes it trivial to quickly simulate phenotypes under a variety of genetic architectures. This package is integrated into our OpenMendel suite for easy downstream analyses. Julia was purpose-built for scientific programming and provides tremendous speed and memory efficiency, easy access to multi-CPU and GPU hardware, and to distributed and cloud-based parallelization. TraitSimulation is designed to encourage flexible trait simulation, including via the standard devices of applied statistics, generalized linear models (GLMs) and generalized linear mixed models (GLMMs). TraitSimulation also accommodates many study designs: unrelateds, sibships, pedigrees, or a mixture of all three. (Of course, for data with pedigrees or cryptic relationships, the simulation process must include the genetic dependencies among the individuals.) We consider an assortment of trait models and study designs to illustrate integrated simulation and analysis pipelines. Step-by-step instructions for these analyses are available in our electronic Jupyter notebooks on Github. These interactive notebooks are ideal for reproducible research. CONCLUSION: The TraitSimulation package has three main advantages. (1) It leverages the computational efficiency and ease of use of Julia to provide extremely fast, straightforward simulation of even the most complex genetic models, including GLMs and GLMMs. (2) It can be operated entirely within, but is not limited to, the integrated analysis pipeline of OpenMendel. And finally (3), by allowing a wider range of more realistic phenotype models, TraitSimulation brings power calculations and diagnostic tools closer to what investigators might see in real-world analyses. |
format | Online Article Text |
id | pubmed-8091532 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-80915322021-05-04 Modern simulation utilities for genetic analysis Ji, Sarah S. German, Christopher A. Lange, Kenneth Sinsheimer, Janet S. Zhou, Hua Zhou, Jin Sobel, Eric M. BMC Bioinformatics Software BACKGROUND: Statistical geneticists employ simulation to estimate the power of proposed studies, test new analysis tools, and evaluate properties of causal models. Although there are existing trait simulators, there is ample room for modernization. For example, most phenotype simulators are limited to Gaussian traits or traits transformable to normality, while ignoring qualitative traits and realistic, non-normal trait distributions. Also, modern computer languages, such as Julia, that accommodate parallelization and cloud-based computing are now mainstream but rarely used in older applications. To meet the challenges of contemporary big studies, it is important for geneticists to adopt new computational tools. RESULTS: We present TraitSimulation, an open-source Julia package that makes it trivial to quickly simulate phenotypes under a variety of genetic architectures. This package is integrated into our OpenMendel suite for easy downstream analyses. Julia was purpose-built for scientific programming and provides tremendous speed and memory efficiency, easy access to multi-CPU and GPU hardware, and to distributed and cloud-based parallelization. TraitSimulation is designed to encourage flexible trait simulation, including via the standard devices of applied statistics, generalized linear models (GLMs) and generalized linear mixed models (GLMMs). TraitSimulation also accommodates many study designs: unrelateds, sibships, pedigrees, or a mixture of all three. (Of course, for data with pedigrees or cryptic relationships, the simulation process must include the genetic dependencies among the individuals.) We consider an assortment of trait models and study designs to illustrate integrated simulation and analysis pipelines. Step-by-step instructions for these analyses are available in our electronic Jupyter notebooks on Github. These interactive notebooks are ideal for reproducible research. CONCLUSION: The TraitSimulation package has three main advantages. (1) It leverages the computational efficiency and ease of use of Julia to provide extremely fast, straightforward simulation of even the most complex genetic models, including GLMs and GLMMs. (2) It can be operated entirely within, but is not limited to, the integrated analysis pipeline of OpenMendel. And finally (3), by allowing a wider range of more realistic phenotype models, TraitSimulation brings power calculations and diagnostic tools closer to what investigators might see in real-world analyses. BioMed Central 2021-05-03 /pmc/articles/PMC8091532/ /pubmed/33941078 http://dx.doi.org/10.1186/s12859-021-04086-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Ji, Sarah S. German, Christopher A. Lange, Kenneth Sinsheimer, Janet S. Zhou, Hua Zhou, Jin Sobel, Eric M. Modern simulation utilities for genetic analysis |
title | Modern simulation utilities for genetic analysis |
title_full | Modern simulation utilities for genetic analysis |
title_fullStr | Modern simulation utilities for genetic analysis |
title_full_unstemmed | Modern simulation utilities for genetic analysis |
title_short | Modern simulation utilities for genetic analysis |
title_sort | modern simulation utilities for genetic analysis |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8091532/ https://www.ncbi.nlm.nih.gov/pubmed/33941078 http://dx.doi.org/10.1186/s12859-021-04086-8 |
work_keys_str_mv | AT jisarahs modernsimulationutilitiesforgeneticanalysis AT germanchristophera modernsimulationutilitiesforgeneticanalysis AT langekenneth modernsimulationutilitiesforgeneticanalysis AT sinsheimerjanets modernsimulationutilitiesforgeneticanalysis AT zhouhua modernsimulationutilitiesforgeneticanalysis AT zhoujin modernsimulationutilitiesforgeneticanalysis AT sobelericm modernsimulationutilitiesforgeneticanalysis |