Cargando…
Benchmarking database systems for Genomic Selection implementation
MOTIVATION: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737464/ https://www.ncbi.nlm.nih.gov/pubmed/31508797 http://dx.doi.org/10.1093/database/baz096 |
_version_ | 1783450663049494528 |
---|---|
author | Nti-Addae, Yaw Matthews, Dave Ulat, Victor Jun Syed, Raza Sempéré, Guilhem Pétel, Adrien Renner, Jon Larmande, Pierre Guignon, Valentin Jones, Elizabeth Robbins, Kelly |
author_facet | Nti-Addae, Yaw Matthews, Dave Ulat, Victor Jun Syed, Raza Sempéré, Guilhem Pétel, Adrien Renner, Jon Larmande, Pierre Guignon, Valentin Jones, Elizabeth Robbins, Kelly |
author_sort | Nti-Addae, Yaw |
collection | PubMed |
description | MOTIVATION: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. RESULTS: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix. AVAILABILITY: http://gobiin1.bti.cornell.edu:6083/projects/GBM/repos/benchmarking/browse |
format | Online Article Text |
id | pubmed-6737464 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-67374642019-09-16 Benchmarking database systems for Genomic Selection implementation Nti-Addae, Yaw Matthews, Dave Ulat, Victor Jun Syed, Raza Sempéré, Guilhem Pétel, Adrien Renner, Jon Larmande, Pierre Guignon, Valentin Jones, Elizabeth Robbins, Kelly Database (Oxford) Review MOTIVATION: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. RESULTS: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix. AVAILABILITY: http://gobiin1.bti.cornell.edu:6083/projects/GBM/repos/benchmarking/browse Oxford University Press 2019-09-11 /pmc/articles/PMC6737464/ /pubmed/31508797 http://dx.doi.org/10.1093/database/baz096 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Review Nti-Addae, Yaw Matthews, Dave Ulat, Victor Jun Syed, Raza Sempéré, Guilhem Pétel, Adrien Renner, Jon Larmande, Pierre Guignon, Valentin Jones, Elizabeth Robbins, Kelly Benchmarking database systems for Genomic Selection implementation |
title | Benchmarking database systems for Genomic Selection implementation |
title_full | Benchmarking database systems for Genomic Selection implementation |
title_fullStr | Benchmarking database systems for Genomic Selection implementation |
title_full_unstemmed | Benchmarking database systems for Genomic Selection implementation |
title_short | Benchmarking database systems for Genomic Selection implementation |
title_sort | benchmarking database systems for genomic selection implementation |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737464/ https://www.ncbi.nlm.nih.gov/pubmed/31508797 http://dx.doi.org/10.1093/database/baz096 |
work_keys_str_mv | AT ntiaddaeyaw benchmarkingdatabasesystemsforgenomicselectionimplementation AT matthewsdave benchmarkingdatabasesystemsforgenomicselectionimplementation AT ulatvictorjun benchmarkingdatabasesystemsforgenomicselectionimplementation AT syedraza benchmarkingdatabasesystemsforgenomicselectionimplementation AT sempereguilhem benchmarkingdatabasesystemsforgenomicselectionimplementation AT peteladrien benchmarkingdatabasesystemsforgenomicselectionimplementation AT rennerjon benchmarkingdatabasesystemsforgenomicselectionimplementation AT larmandepierre benchmarkingdatabasesystemsforgenomicselectionimplementation AT guignonvalentin benchmarkingdatabasesystemsforgenomicselectionimplementation AT joneselizabeth benchmarkingdatabasesystemsforgenomicselectionimplementation AT robbinskelly benchmarkingdatabasesystemsforgenomicselectionimplementation |