Cargando…

The Mega2R package: R tools for accessing and processing genetic data in common formats

The standalone C++ Mega2 program has been facilitating data-reformatting for linkage and association analysis programs since 2000. Support for more analysis programs has been added over time. Currently, Mega2 converts data from several different genetic data formats (including PLINK, VCF, BCF, and I...

Descripción completa

Detalles Bibliográficos
Autores principales: Baron, Robert V., Stickel, Justin R., Weeks, Daniel E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6137409/
https://www.ncbi.nlm.nih.gov/pubmed/30271589
http://dx.doi.org/10.12688/f1000research.15949.2
_version_ 1783355182537506816
author Baron, Robert V.
Stickel, Justin R.
Weeks, Daniel E.
author_facet Baron, Robert V.
Stickel, Justin R.
Weeks, Daniel E.
author_sort Baron, Robert V.
collection PubMed
description The standalone C++ Mega2 program has been facilitating data-reformatting for linkage and association analysis programs since 2000. Support for more analysis programs has been added over time. Currently, Mega2 converts data from several different genetic data formats (including PLINK, VCF, BCF, and IMPUTE2) into the specific data requirements for over 40 commonly-used linkage and association analysis programs (including Mendel, Merlin, Morgan, SHAPEIT, ROADTRIPS, MaCH/minimac3). Recently, Mega2 has been enhanced to use a SQLite database as an intermediate data representation. Additionally, Mega2 now stores bialleleic genotype data in a highly compressed form, like that of the GenABEL R package and the PLINK binary format. Our new Mega2R package now makes it easy to load Mega2 SQLite databases directly into R as data frames. In addition, Mega2R is memory efficient, keeping its genotype data in a compressed format, portions of which are only expanded when needed. Mega2R has functions that ease the process of applying gene-based tests by looping over genes, efficiently pulling out genotypes for variants within the desired boundaries. We have also created several more functions that illustrate how to use the data frames: these permit one to run the pedgene package to carry out gene-based association tests on family data, to run the SKAT package to carry out gene-based association tests, to output the Mega2R data as a VCF file and related files (for phenotype and family data), and to convert the data frames into GenABEL format. The Mega2R package enhances GenABEL since it supports additional input data formats (such as PLINK, VCF, and IMPUTE2) not currently supported by GenABEL. The Mega2 program and the Mega2R R package are both open source and are freely available, along with extensive documentation, from https://watson.hgen.pitt.edu/register for Mega2 and https://CRAN.R-project.org/package=Mega2R for Mega2R.
format Online
Article
Text
id pubmed-6137409
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-61374092018-09-28 The Mega2R package: R tools for accessing and processing genetic data in common formats Baron, Robert V. Stickel, Justin R. Weeks, Daniel E. F1000Res Software Tool Article The standalone C++ Mega2 program has been facilitating data-reformatting for linkage and association analysis programs since 2000. Support for more analysis programs has been added over time. Currently, Mega2 converts data from several different genetic data formats (including PLINK, VCF, BCF, and IMPUTE2) into the specific data requirements for over 40 commonly-used linkage and association analysis programs (including Mendel, Merlin, Morgan, SHAPEIT, ROADTRIPS, MaCH/minimac3). Recently, Mega2 has been enhanced to use a SQLite database as an intermediate data representation. Additionally, Mega2 now stores bialleleic genotype data in a highly compressed form, like that of the GenABEL R package and the PLINK binary format. Our new Mega2R package now makes it easy to load Mega2 SQLite databases directly into R as data frames. In addition, Mega2R is memory efficient, keeping its genotype data in a compressed format, portions of which are only expanded when needed. Mega2R has functions that ease the process of applying gene-based tests by looping over genes, efficiently pulling out genotypes for variants within the desired boundaries. We have also created several more functions that illustrate how to use the data frames: these permit one to run the pedgene package to carry out gene-based association tests on family data, to run the SKAT package to carry out gene-based association tests, to output the Mega2R data as a VCF file and related files (for phenotype and family data), and to convert the data frames into GenABEL format. The Mega2R package enhances GenABEL since it supports additional input data formats (such as PLINK, VCF, and IMPUTE2) not currently supported by GenABEL. The Mega2 program and the Mega2R R package are both open source and are freely available, along with extensive documentation, from https://watson.hgen.pitt.edu/register for Mega2 and https://CRAN.R-project.org/package=Mega2R for Mega2R. F1000 Research Limited 2019-02-25 /pmc/articles/PMC6137409/ /pubmed/30271589 http://dx.doi.org/10.12688/f1000research.15949.2 Text en Copyright: © 2019 Baron RV et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Baron, Robert V.
Stickel, Justin R.
Weeks, Daniel E.
The Mega2R package: R tools for accessing and processing genetic data in common formats
title The Mega2R package: R tools for accessing and processing genetic data in common formats
title_full The Mega2R package: R tools for accessing and processing genetic data in common formats
title_fullStr The Mega2R package: R tools for accessing and processing genetic data in common formats
title_full_unstemmed The Mega2R package: R tools for accessing and processing genetic data in common formats
title_short The Mega2R package: R tools for accessing and processing genetic data in common formats
title_sort mega2r package: r tools for accessing and processing genetic data in common formats
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6137409/
https://www.ncbi.nlm.nih.gov/pubmed/30271589
http://dx.doi.org/10.12688/f1000research.15949.2
work_keys_str_mv AT baronrobertv themega2rpackagertoolsforaccessingandprocessinggeneticdataincommonformats
AT stickeljustinr themega2rpackagertoolsforaccessingandprocessinggeneticdataincommonformats
AT weeksdaniele themega2rpackagertoolsforaccessingandprocessinggeneticdataincommonformats
AT baronrobertv mega2rpackagertoolsforaccessingandprocessinggeneticdataincommonformats
AT stickeljustinr mega2rpackagertoolsforaccessingandprocessinggeneticdataincommonformats
AT weeksdaniele mega2rpackagertoolsforaccessingandprocessinggeneticdataincommonformats