Cargando…

ukbtools: An R package to manage and query UK Biobank data

INTRODUCTION: The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes...

Descripción completa

Detalles Bibliográficos
Autores principales: Hanscombe, Ken B., Coleman, Jonathan R. I., Traylor, Matthew, Lewis, Cathryn M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544205/
https://www.ncbi.nlm.nih.gov/pubmed/31150407
http://dx.doi.org/10.1371/journal.pone.0214311
_version_ 1783423211834179584
author Hanscombe, Ken B.
Coleman, Jonathan R. I.
Traylor, Matthew
Lewis, Cathryn M.
author_facet Hanscombe, Ken B.
Coleman, Jonathan R. I.
Traylor, Matthew
Lewis, Cathryn M.
author_sort Hanscombe, Ken B.
collection PubMed
description INTRODUCTION: The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. RESULTS: ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. CONCLUSION: Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.
format Online
Article
Text
id pubmed-6544205
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65442052019-06-17 ukbtools: An R package to manage and query UK Biobank data Hanscombe, Ken B. Coleman, Jonathan R. I. Traylor, Matthew Lewis, Cathryn M. PLoS One Research Article INTRODUCTION: The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. RESULTS: ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. CONCLUSION: Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research. Public Library of Science 2019-05-31 /pmc/articles/PMC6544205/ /pubmed/31150407 http://dx.doi.org/10.1371/journal.pone.0214311 Text en © 2019 Hanscombe et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hanscombe, Ken B.
Coleman, Jonathan R. I.
Traylor, Matthew
Lewis, Cathryn M.
ukbtools: An R package to manage and query UK Biobank data
title ukbtools: An R package to manage and query UK Biobank data
title_full ukbtools: An R package to manage and query UK Biobank data
title_fullStr ukbtools: An R package to manage and query UK Biobank data
title_full_unstemmed ukbtools: An R package to manage and query UK Biobank data
title_short ukbtools: An R package to manage and query UK Biobank data
title_sort ukbtools: an r package to manage and query uk biobank data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544205/
https://www.ncbi.nlm.nih.gov/pubmed/31150407
http://dx.doi.org/10.1371/journal.pone.0214311
work_keys_str_mv AT hanscombekenb ukbtoolsanrpackagetomanageandqueryukbiobankdata
AT colemanjonathanri ukbtoolsanrpackagetomanageandqueryukbiobankdata
AT traylormatthew ukbtoolsanrpackagetomanageandqueryukbiobankdata
AT lewiscathrynm ukbtoolsanrpackagetomanageandqueryukbiobankdata