Cargando…
Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration
BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about whic...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4307387/ https://www.ncbi.nlm.nih.gov/pubmed/25495213 http://dx.doi.org/10.1186/1756-0500-7-901 |
_version_ | 1782354460811460608 |
---|---|
author | Deelen, Patrick Bonder, Marc Jan van der Velde, K Joeri Westra, Harm-Jan Winder, Erwin Hendriksen, Dennis Franke, Lude Swertz, Morris A |
author_facet | Deelen, Patrick Bonder, Marc Jan van der Velde, K Joeri Westra, Harm-Jan Winder, Erwin Hendriksen, Dennis Franke, Lude Swertz, Morris A |
author_sort | Deelen, Patrick |
collection | PubMed |
description | BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. FINDINGS: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java ‘Genotype-IO’ API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics. CONCLUSIONS: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines. |
format | Online Article Text |
id | pubmed-4307387 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43073872015-01-28 Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration Deelen, Patrick Bonder, Marc Jan van der Velde, K Joeri Westra, Harm-Jan Winder, Erwin Hendriksen, Dennis Franke, Lude Swertz, Morris A BMC Res Notes Technical Note BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. FINDINGS: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java ‘Genotype-IO’ API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics. CONCLUSIONS: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines. BioMed Central 2014-12-11 /pmc/articles/PMC4307387/ /pubmed/25495213 http://dx.doi.org/10.1186/1756-0500-7-901 Text en © Deelen et al.; licensee BioMed Central. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Technical Note Deelen, Patrick Bonder, Marc Jan van der Velde, K Joeri Westra, Harm-Jan Winder, Erwin Hendriksen, Dennis Franke, Lude Swertz, Morris A Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration |
title | Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration |
title_full | Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration |
title_fullStr | Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration |
title_full_unstemmed | Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration |
title_short | Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration |
title_sort | genotype harmonizer: automatic strand alignment and format conversion for genotype data integration |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4307387/ https://www.ncbi.nlm.nih.gov/pubmed/25495213 http://dx.doi.org/10.1186/1756-0500-7-901 |
work_keys_str_mv | AT deelenpatrick genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration AT bondermarcjan genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration AT vanderveldekjoeri genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration AT westraharmjan genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration AT windererwin genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration AT hendriksendennis genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration AT frankelude genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration AT swertzmorrisa genotypeharmonizerautomaticstrandalignmentandformatconversionforgenotypedataintegration |