Cargando…
Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity an...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
eLife Sciences Publications, Ltd
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10328510/ https://www.ncbi.nlm.nih.gov/pubmed/37342968 http://dx.doi.org/10.7554/eLife.84874 |
_version_ | 1785069813995929600 |
---|---|
author | Lauterbur, M Elise Cavassim, Maria Izabel A Gladstein, Ariella L Gower, Graham Pope, Nathaniel S Tsambos, Georgia Adrion, Jeffrey Belsare, Saurabh Biddanda, Arjun Caudill, Victoria Cury, Jean Echevarria, Ignacio Haller, Benjamin C Hasan, Ahmed R Huang, Xin Iasi, Leonardo Nicola Martin Noskova, Ekaterina Obsteter, Jana Pavinato, Vitor Antonio Correa Pearson, Alice Peede, David Perez, Manolo F Rodrigues, Murillo F Smith, Chris CR Spence, Jeffrey P Teterina, Anastasia Tittes, Silas Unneberg, Per Vazquez, Juan Manuel Waples, Ryan K Wohns, Anthony Wilder Wong, Yan Baumdicker, Franz Cartwright, Reed A Gorjanc, Gregor Gutenkunst, Ryan N Kelleher, Jerome Kern, Andrew D Ragsdale, Aaron P Ralph, Peter L Schrider, Daniel R Gronau, Ilan |
author_facet | Lauterbur, M Elise Cavassim, Maria Izabel A Gladstein, Ariella L Gower, Graham Pope, Nathaniel S Tsambos, Georgia Adrion, Jeffrey Belsare, Saurabh Biddanda, Arjun Caudill, Victoria Cury, Jean Echevarria, Ignacio Haller, Benjamin C Hasan, Ahmed R Huang, Xin Iasi, Leonardo Nicola Martin Noskova, Ekaterina Obsteter, Jana Pavinato, Vitor Antonio Correa Pearson, Alice Peede, David Perez, Manolo F Rodrigues, Murillo F Smith, Chris CR Spence, Jeffrey P Teterina, Anastasia Tittes, Silas Unneberg, Per Vazquez, Juan Manuel Waples, Ryan K Wohns, Anthony Wilder Wong, Yan Baumdicker, Franz Cartwright, Reed A Gorjanc, Gregor Gutenkunst, Ryan N Kelleher, Jerome Kern, Andrew D Ragsdale, Aaron P Ralph, Peter L Schrider, Daniel R Gronau, Ilan |
author_sort | Lauterbur, M Elise |
collection | PubMed |
description | Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone. |
format | Online Article Text |
id | pubmed-10328510 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | eLife Sciences Publications, Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-103285102023-07-08 Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations Lauterbur, M Elise Cavassim, Maria Izabel A Gladstein, Ariella L Gower, Graham Pope, Nathaniel S Tsambos, Georgia Adrion, Jeffrey Belsare, Saurabh Biddanda, Arjun Caudill, Victoria Cury, Jean Echevarria, Ignacio Haller, Benjamin C Hasan, Ahmed R Huang, Xin Iasi, Leonardo Nicola Martin Noskova, Ekaterina Obsteter, Jana Pavinato, Vitor Antonio Correa Pearson, Alice Peede, David Perez, Manolo F Rodrigues, Murillo F Smith, Chris CR Spence, Jeffrey P Teterina, Anastasia Tittes, Silas Unneberg, Per Vazquez, Juan Manuel Waples, Ryan K Wohns, Anthony Wilder Wong, Yan Baumdicker, Franz Cartwright, Reed A Gorjanc, Gregor Gutenkunst, Ryan N Kelleher, Jerome Kern, Andrew D Ragsdale, Aaron P Ralph, Peter L Schrider, Daniel R Gronau, Ilan eLife Genetics and Genomics Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone. eLife Sciences Publications, Ltd 2023-06-21 /pmc/articles/PMC10328510/ /pubmed/37342968 http://dx.doi.org/10.7554/eLife.84874 Text en © 2023, Lauterbur et al https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited. |
spellingShingle | Genetics and Genomics Lauterbur, M Elise Cavassim, Maria Izabel A Gladstein, Ariella L Gower, Graham Pope, Nathaniel S Tsambos, Georgia Adrion, Jeffrey Belsare, Saurabh Biddanda, Arjun Caudill, Victoria Cury, Jean Echevarria, Ignacio Haller, Benjamin C Hasan, Ahmed R Huang, Xin Iasi, Leonardo Nicola Martin Noskova, Ekaterina Obsteter, Jana Pavinato, Vitor Antonio Correa Pearson, Alice Peede, David Perez, Manolo F Rodrigues, Murillo F Smith, Chris CR Spence, Jeffrey P Teterina, Anastasia Tittes, Silas Unneberg, Per Vazquez, Juan Manuel Waples, Ryan K Wohns, Anthony Wilder Wong, Yan Baumdicker, Franz Cartwright, Reed A Gorjanc, Gregor Gutenkunst, Ryan N Kelleher, Jerome Kern, Andrew D Ragsdale, Aaron P Ralph, Peter L Schrider, Daniel R Gronau, Ilan Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations |
title | Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations |
title_full | Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations |
title_fullStr | Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations |
title_full_unstemmed | Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations |
title_short | Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations |
title_sort | expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations |
topic | Genetics and Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10328510/ https://www.ncbi.nlm.nih.gov/pubmed/37342968 http://dx.doi.org/10.7554/eLife.84874 |
work_keys_str_mv | AT lauterburmelise expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT cavassimmariaizabela expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT gladsteinariellal expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT gowergraham expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT popenathaniels expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT tsambosgeorgia expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT adrionjeffrey expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT belsaresaurabh expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT biddandaarjun expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT caudillvictoria expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT curyjean expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT echevarriaignacio expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT hallerbenjaminc expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT hasanahmedr expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT huangxin expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT iasileonardonicolamartin expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT noskovaekaterina expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT obsteterjana expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT pavinatovitorantoniocorrea expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT pearsonalice expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT peededavid expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT perezmanolof expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT rodriguesmurillof expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT smithchriscr expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT spencejeffreyp expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT teterinaanastasia expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT tittessilas expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT unnebergper expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT vazquezjuanmanuel expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT waplesryank expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT wohnsanthonywilder expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT wongyan expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT baumdickerfranz expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT cartwrightreeda expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT gorjancgregor expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT gutenkunstryann expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT kelleherjerome expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT kernandrewd expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT ragsdaleaaronp expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT ralphpeterl expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT schriderdanielr expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations AT gronauilan expandingthestdpopsimspeciescatalogandlessonslearnedforrealisticgenomesimulations |