Cargando…

Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration

BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about whic...

Descripción completa

Detalles Bibliográficos
Autores principales: Deelen, Patrick, Bonder, Marc Jan, van der Velde, K Joeri, Westra, Harm-Jan, Winder, Erwin, Hendriksen, Dennis, Franke, Lude, Swertz, Morris A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4307387/
https://www.ncbi.nlm.nih.gov/pubmed/25495213
http://dx.doi.org/10.1186/1756-0500-7-901
_version_ 1782354460811460608
author Deelen, Patrick
Bonder, Marc Jan
van der Velde, K Joeri
Westra, Harm-Jan
Winder, Erwin
Hendriksen, Dennis
Franke, Lude
Swertz, Morris A
author_facet Deelen, Patrick
Bonder, Marc Jan
van der Velde, K Joeri
Westra, Harm-Jan
Winder, Erwin
Hendriksen, Dennis
Franke, Lude
Swertz, Morris A
author_sort Deelen, Patrick
collection PubMed
description BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. FINDINGS: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java ‘Genotype-IO’ API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics. CONCLUSIONS: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines.
format Online
Article
Text
id pubmed-4307387
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43073872015-01-28 Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration Deelen, Patrick Bonder, Marc Jan van der Velde, K Joeri Westra, Harm-Jan Winder, Erwin Hendriksen, Dennis Franke, Lude Swertz, Morris A BMC Res Notes Technical Note BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. FINDINGS: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java ‘Genotype-IO’ API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics. CONCLUSIONS: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines. BioMed Central 2014-12-11 /pmc/articles/PMC4307387/ /pubmed/25495213 http://dx.doi.org/10.1186/1756-0500-7-901 Text en © Deelen et al.; licensee BioMed Central. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Deelen, Patrick
Bonder, Marc Jan
van der Velde, K Joeri
Westra, Harm-Jan
Winder, Erwin
Hendriksen, Dennis
Franke, Lude
Swertz, Morris A
Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration
title Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration
title_full Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration
title_fullStr Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration
title_full_unstemmed Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration
title_short Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration
title_sort genotype harmonizer: automatic strand alignment and format conversion for genotype data integration
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4307387/
https://www.ncbi.nlm.nih.gov/pubmed/25495213
http://dx.doi.org/10.1186/1756-0500-7-901
work_keys_str_mv AT deelenpatrick genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration
AT bondermarcjan genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration
AT vanderveldekjoeri genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration
AT westraharmjan genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration
AT windererwin genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration
AT hendriksendennis genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration
AT frankelude genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration
AT swertzmorrisa genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration