Cargando…
PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator
Although genome sequencing has become increasingly popular, the simulation of individual genomes is still important. This is because sequencing a large number of individual genomes is costly and genome data with extreme and boundary conditions, such as fatal genetic defects, are difficult to obtain....
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997238/ https://www.ncbi.nlm.nih.gov/pubmed/32047747 http://dx.doi.org/10.3389/fbioe.2020.00028 |
_version_ | 1783493654393913344 |
---|---|
author | Juan, Liran Wang, Yongtian Jiang, Jingyi Yang, Qi Jiang, Qinghua Wang, Yadong |
author_facet | Juan, Liran Wang, Yongtian Jiang, Jingyi Yang, Qi Jiang, Qinghua Wang, Yadong |
author_sort | Juan, Liran |
collection | PubMed |
description | Although genome sequencing has become increasingly popular, the simulation of individual genomes is still important. This is because sequencing a large number of individual genomes is costly and genome data with extreme and boundary conditions, such as fatal genetic defects, are difficult to obtain. Privacy and legal barriers also prevent many applications of real data. Large sequencing projects in recent years have provided a deeper understanding of the human genome. However, there is a lack of tools to leverage known data to simulate personal genomes as real as possible. Here, we designed and developed PGsim, a comprehensive and highly customizable individual genome simulator, that fully uses existing knowledge, such as variant allele frequencies in global or world main populations, mutation probability differences between protein-coding regions and non-coding regions, transition/transversion (Ti/Tv) ratios, Indel incidence, Indel length distribution, structural variation sites, and pathogenic mutation sites. Users can flexibly control the proportion and quantity of known variants, common variants, novel variants in both coding and non-coding regions, and special variants through detailed parameter settings. To ensure that the simulated personal genome has sufficient randomness, PGsim makes the generated variants more real and reliable in terms of variant distribution, proportion, and population characteristics. PGsim is able to employ a huge volume database as background data to simulate personal genomes and does not require SQL database support. Users can easily change the variant databases used as needed. As a Perl script, there is no obstacle to running PGsim on any version of the MAC OS or Linux systems, and no libraries, packages, interpreters, compilers, or other dependencies need to be installed in advance. The PGsim tool is publicly available at https://github.com/lrjuan/PGsim. |
format | Online Article Text |
id | pubmed-6997238 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-69972382020-02-11 PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator Juan, Liran Wang, Yongtian Jiang, Jingyi Yang, Qi Jiang, Qinghua Wang, Yadong Front Bioeng Biotechnol Bioengineering and Biotechnology Although genome sequencing has become increasingly popular, the simulation of individual genomes is still important. This is because sequencing a large number of individual genomes is costly and genome data with extreme and boundary conditions, such as fatal genetic defects, are difficult to obtain. Privacy and legal barriers also prevent many applications of real data. Large sequencing projects in recent years have provided a deeper understanding of the human genome. However, there is a lack of tools to leverage known data to simulate personal genomes as real as possible. Here, we designed and developed PGsim, a comprehensive and highly customizable individual genome simulator, that fully uses existing knowledge, such as variant allele frequencies in global or world main populations, mutation probability differences between protein-coding regions and non-coding regions, transition/transversion (Ti/Tv) ratios, Indel incidence, Indel length distribution, structural variation sites, and pathogenic mutation sites. Users can flexibly control the proportion and quantity of known variants, common variants, novel variants in both coding and non-coding regions, and special variants through detailed parameter settings. To ensure that the simulated personal genome has sufficient randomness, PGsim makes the generated variants more real and reliable in terms of variant distribution, proportion, and population characteristics. PGsim is able to employ a huge volume database as background data to simulate personal genomes and does not require SQL database support. Users can easily change the variant databases used as needed. As a Perl script, there is no obstacle to running PGsim on any version of the MAC OS or Linux systems, and no libraries, packages, interpreters, compilers, or other dependencies need to be installed in advance. The PGsim tool is publicly available at https://github.com/lrjuan/PGsim. Frontiers Media S.A. 2020-01-28 /pmc/articles/PMC6997238/ /pubmed/32047747 http://dx.doi.org/10.3389/fbioe.2020.00028 Text en Copyright © 2020 Juan, Wang, Jiang, Yang, Jiang and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioengineering and Biotechnology Juan, Liran Wang, Yongtian Jiang, Jingyi Yang, Qi Jiang, Qinghua Wang, Yadong PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator |
title | PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator |
title_full | PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator |
title_fullStr | PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator |
title_full_unstemmed | PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator |
title_short | PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator |
title_sort | pgsim: a comprehensive and highly customizable personal genome simulator |
topic | Bioengineering and Biotechnology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997238/ https://www.ncbi.nlm.nih.gov/pubmed/32047747 http://dx.doi.org/10.3389/fbioe.2020.00028 |
work_keys_str_mv | AT juanliran pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator AT wangyongtian pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator AT jiangjingyi pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator AT yangqi pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator AT jiangqinghua pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator AT wangyadong pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator |