Cargando…

Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration

BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about whic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Deelen, Patrick, Bonder, Marc Jan, van der Velde, K Joeri, Westra, Harm-Jan, Winder, Erwin, Hendriksen, Dennis, Franke, Lude, Swertz, Morris A
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4307387/ https://www.ncbi.nlm.nih.gov/pubmed/25495213 http://dx.doi.org/10.1186/1756-0500-7-901

Descripción
Sumario:	BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. FINDINGS: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java ‘Genotype-IO’ API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics. CONCLUSIONS: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines.

Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration

Ejemplares similares