Cargando…

PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator

Although genome sequencing has become increasingly popular, the simulation of individual genomes is still important. This is because sequencing a large number of individual genomes is costly and genome data with extreme and boundary conditions, such as fatal genetic defects, are difficult to obtain....

Descripción completa

Detalles Bibliográficos
Autores principales: Juan, Liran, Wang, Yongtian, Jiang, Jingyi, Yang, Qi, Jiang, Qinghua, Wang, Yadong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997238/
https://www.ncbi.nlm.nih.gov/pubmed/32047747
http://dx.doi.org/10.3389/fbioe.2020.00028
_version_ 1783493654393913344
author Juan, Liran
Wang, Yongtian
Jiang, Jingyi
Yang, Qi
Jiang, Qinghua
Wang, Yadong
author_facet Juan, Liran
Wang, Yongtian
Jiang, Jingyi
Yang, Qi
Jiang, Qinghua
Wang, Yadong
author_sort Juan, Liran
collection PubMed
description Although genome sequencing has become increasingly popular, the simulation of individual genomes is still important. This is because sequencing a large number of individual genomes is costly and genome data with extreme and boundary conditions, such as fatal genetic defects, are difficult to obtain. Privacy and legal barriers also prevent many applications of real data. Large sequencing projects in recent years have provided a deeper understanding of the human genome. However, there is a lack of tools to leverage known data to simulate personal genomes as real as possible. Here, we designed and developed PGsim, a comprehensive and highly customizable individual genome simulator, that fully uses existing knowledge, such as variant allele frequencies in global or world main populations, mutation probability differences between protein-coding regions and non-coding regions, transition/transversion (Ti/Tv) ratios, Indel incidence, Indel length distribution, structural variation sites, and pathogenic mutation sites. Users can flexibly control the proportion and quantity of known variants, common variants, novel variants in both coding and non-coding regions, and special variants through detailed parameter settings. To ensure that the simulated personal genome has sufficient randomness, PGsim makes the generated variants more real and reliable in terms of variant distribution, proportion, and population characteristics. PGsim is able to employ a huge volume database as background data to simulate personal genomes and does not require SQL database support. Users can easily change the variant databases used as needed. As a Perl script, there is no obstacle to running PGsim on any version of the MAC OS or Linux systems, and no libraries, packages, interpreters, compilers, or other dependencies need to be installed in advance. The PGsim tool is publicly available at https://github.com/lrjuan/PGsim.
format Online
Article
Text
id pubmed-6997238
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-69972382020-02-11 PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator Juan, Liran Wang, Yongtian Jiang, Jingyi Yang, Qi Jiang, Qinghua Wang, Yadong Front Bioeng Biotechnol Bioengineering and Biotechnology Although genome sequencing has become increasingly popular, the simulation of individual genomes is still important. This is because sequencing a large number of individual genomes is costly and genome data with extreme and boundary conditions, such as fatal genetic defects, are difficult to obtain. Privacy and legal barriers also prevent many applications of real data. Large sequencing projects in recent years have provided a deeper understanding of the human genome. However, there is a lack of tools to leverage known data to simulate personal genomes as real as possible. Here, we designed and developed PGsim, a comprehensive and highly customizable individual genome simulator, that fully uses existing knowledge, such as variant allele frequencies in global or world main populations, mutation probability differences between protein-coding regions and non-coding regions, transition/transversion (Ti/Tv) ratios, Indel incidence, Indel length distribution, structural variation sites, and pathogenic mutation sites. Users can flexibly control the proportion and quantity of known variants, common variants, novel variants in both coding and non-coding regions, and special variants through detailed parameter settings. To ensure that the simulated personal genome has sufficient randomness, PGsim makes the generated variants more real and reliable in terms of variant distribution, proportion, and population characteristics. PGsim is able to employ a huge volume database as background data to simulate personal genomes and does not require SQL database support. Users can easily change the variant databases used as needed. As a Perl script, there is no obstacle to running PGsim on any version of the MAC OS or Linux systems, and no libraries, packages, interpreters, compilers, or other dependencies need to be installed in advance. The PGsim tool is publicly available at https://github.com/lrjuan/PGsim. Frontiers Media S.A. 2020-01-28 /pmc/articles/PMC6997238/ /pubmed/32047747 http://dx.doi.org/10.3389/fbioe.2020.00028 Text en Copyright © 2020 Juan, Wang, Jiang, Yang, Jiang and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
Juan, Liran
Wang, Yongtian
Jiang, Jingyi
Yang, Qi
Jiang, Qinghua
Wang, Yadong
PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator
title PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator
title_full PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator
title_fullStr PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator
title_full_unstemmed PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator
title_short PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator
title_sort pgsim: a comprehensive and highly customizable personal genome simulator
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997238/
https://www.ncbi.nlm.nih.gov/pubmed/32047747
http://dx.doi.org/10.3389/fbioe.2020.00028
work_keys_str_mv AT juanliran pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator
AT wangyongtian pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator
AT jiangjingyi pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator
AT yangqi pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator
AT jiangqinghua pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator
AT wangyadong pgsimacomprehensiveandhighlycustomizablepersonalgenomesimulator