Cargando…

The variant call format provides efficient and robust storage of GWAS summary statistics

GWAS summary statistics are fundamental for a variety of research applications yet no common storage format has been widely adopted. Existing tabular formats ambiguously or incompletely store information about genetic variants and associations, lack essential metadata and are typically not indexed y...

Descripción completa

Detalles Bibliográficos
Autores principales: Lyon, Matthew S., Andrews, Shea J., Elsworth, Ben, Gaunt, Tom R., Hemani, Gibran, Marcora, Edoardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805039/
https://www.ncbi.nlm.nih.gov/pubmed/33441155
http://dx.doi.org/10.1186/s13059-020-02248-0
_version_ 1783636237595181056
author Lyon, Matthew S.
Andrews, Shea J.
Elsworth, Ben
Gaunt, Tom R.
Hemani, Gibran
Marcora, Edoardo
author_facet Lyon, Matthew S.
Andrews, Shea J.
Elsworth, Ben
Gaunt, Tom R.
Hemani, Gibran
Marcora, Edoardo
author_sort Lyon, Matthew S.
collection PubMed
description GWAS summary statistics are fundamental for a variety of research applications yet no common storage format has been widely adopted. Existing tabular formats ambiguously or incompletely store information about genetic variants and associations, lack essential metadata and are typically not indexed yielding poor query performance and increasing the possibility of errors in data interpretation and post-GWAS analyses. To address these issues, we adapted the variant call format to store GWAS summary statistics (GWAS-VCF) and developed open-source tools to use this format in downstream analyses. We provide open access to over 10,000 complete GWAS summary datasets converted to this format (https://gwas.mrcieu.ac.uk). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-020-02248-0.
format Online
Article
Text
id pubmed-7805039
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78050392021-01-14 The variant call format provides efficient and robust storage of GWAS summary statistics Lyon, Matthew S. Andrews, Shea J. Elsworth, Ben Gaunt, Tom R. Hemani, Gibran Marcora, Edoardo Genome Biol Method GWAS summary statistics are fundamental for a variety of research applications yet no common storage format has been widely adopted. Existing tabular formats ambiguously or incompletely store information about genetic variants and associations, lack essential metadata and are typically not indexed yielding poor query performance and increasing the possibility of errors in data interpretation and post-GWAS analyses. To address these issues, we adapted the variant call format to store GWAS summary statistics (GWAS-VCF) and developed open-source tools to use this format in downstream analyses. We provide open access to over 10,000 complete GWAS summary datasets converted to this format (https://gwas.mrcieu.ac.uk). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-020-02248-0. BioMed Central 2021-01-13 /pmc/articles/PMC7805039/ /pubmed/33441155 http://dx.doi.org/10.1186/s13059-020-02248-0 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Method
Lyon, Matthew S.
Andrews, Shea J.
Elsworth, Ben
Gaunt, Tom R.
Hemani, Gibran
Marcora, Edoardo
The variant call format provides efficient and robust storage of GWAS summary statistics
title The variant call format provides efficient and robust storage of GWAS summary statistics
title_full The variant call format provides efficient and robust storage of GWAS summary statistics
title_fullStr The variant call format provides efficient and robust storage of GWAS summary statistics
title_full_unstemmed The variant call format provides efficient and robust storage of GWAS summary statistics
title_short The variant call format provides efficient and robust storage of GWAS summary statistics
title_sort variant call format provides efficient and robust storage of gwas summary statistics
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7805039/
https://www.ncbi.nlm.nih.gov/pubmed/33441155
http://dx.doi.org/10.1186/s13059-020-02248-0
work_keys_str_mv AT lyonmatthews thevariantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT andrewssheaj thevariantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT elsworthben thevariantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT gaunttomr thevariantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT hemanigibran thevariantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT marcoraedoardo thevariantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT lyonmatthews variantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT andrewssheaj variantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT elsworthben variantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT gaunttomr variantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT hemanigibran variantcallformatprovidesefficientandrobuststorageofgwassummarystatistics
AT marcoraedoardo variantcallformatprovidesefficientandrobuststorageofgwassummarystatistics