Cargando…
ukbtools: An R package to manage and query UK Biobank data
INTRODUCTION: The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544205/ https://www.ncbi.nlm.nih.gov/pubmed/31150407 http://dx.doi.org/10.1371/journal.pone.0214311 |
_version_ | 1783423211834179584 |
---|---|
author | Hanscombe, Ken B. Coleman, Jonathan R. I. Traylor, Matthew Lewis, Cathryn M. |
author_facet | Hanscombe, Ken B. Coleman, Jonathan R. I. Traylor, Matthew Lewis, Cathryn M. |
author_sort | Hanscombe, Ken B. |
collection | PubMed |
description | INTRODUCTION: The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. RESULTS: ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. CONCLUSION: Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research. |
format | Online Article Text |
id | pubmed-6544205 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-65442052019-06-17 ukbtools: An R package to manage and query UK Biobank data Hanscombe, Ken B. Coleman, Jonathan R. I. Traylor, Matthew Lewis, Cathryn M. PLoS One Research Article INTRODUCTION: The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. RESULTS: ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. CONCLUSION: Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research. Public Library of Science 2019-05-31 /pmc/articles/PMC6544205/ /pubmed/31150407 http://dx.doi.org/10.1371/journal.pone.0214311 Text en © 2019 Hanscombe et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Hanscombe, Ken B. Coleman, Jonathan R. I. Traylor, Matthew Lewis, Cathryn M. ukbtools: An R package to manage and query UK Biobank data |
title | ukbtools: An R package to manage and query UK Biobank data |
title_full | ukbtools: An R package to manage and query UK Biobank data |
title_fullStr | ukbtools: An R package to manage and query UK Biobank data |
title_full_unstemmed | ukbtools: An R package to manage and query UK Biobank data |
title_short | ukbtools: An R package to manage and query UK Biobank data |
title_sort | ukbtools: an r package to manage and query uk biobank data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544205/ https://www.ncbi.nlm.nih.gov/pubmed/31150407 http://dx.doi.org/10.1371/journal.pone.0214311 |
work_keys_str_mv | AT hanscombekenb ukbtoolsanrpackagetomanageandqueryukbiobankdata AT colemanjonathanri ukbtoolsanrpackagetomanageandqueryukbiobankdata AT traylormatthew ukbtoolsanrpackagetomanageandqueryukbiobankdata AT lewiscathrynm ukbtoolsanrpackagetomanageandqueryukbiobankdata |