Cargando…

SNPpy - Database Management for SNP Data from Genome Wide Association Studies

BACKGROUND: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for...

Descripción completa

Detalles Bibliográficos
Autores principales: Mitha, Faheem, Herodotou, Herodotos, Borisov, Nedyalko, Jiang, Chen, Yoder, Josh, Owzar, Kouros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3198468/
https://www.ncbi.nlm.nih.gov/pubmed/22039405
http://dx.doi.org/10.1371/journal.pone.0024982
_version_ 1782214431725322240
author Mitha, Faheem
Herodotou, Herodotos
Borisov, Nedyalko
Jiang, Chen
Yoder, Josh
Owzar, Kouros
author_facet Mitha, Faheem
Herodotou, Herodotos
Borisov, Nedyalko
Jiang, Chen
Yoder, Josh
Owzar, Kouros
author_sort Mitha, Faheem
collection PubMed
description BACKGROUND: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software. RESULTS: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. CONCLUSIONS: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.
format Online
Article
Text
id pubmed-3198468
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31984682011-10-28 SNPpy - Database Management for SNP Data from Genome Wide Association Studies Mitha, Faheem Herodotou, Herodotos Borisov, Nedyalko Jiang, Chen Yoder, Josh Owzar, Kouros PLoS One Research Article BACKGROUND: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software. RESULTS: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. CONCLUSIONS: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data. Public Library of Science 2011-10-19 /pmc/articles/PMC3198468/ /pubmed/22039405 http://dx.doi.org/10.1371/journal.pone.0024982 Text en Mitha et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Mitha, Faheem
Herodotou, Herodotos
Borisov, Nedyalko
Jiang, Chen
Yoder, Josh
Owzar, Kouros
SNPpy - Database Management for SNP Data from Genome Wide Association Studies
title SNPpy - Database Management for SNP Data from Genome Wide Association Studies
title_full SNPpy - Database Management for SNP Data from Genome Wide Association Studies
title_fullStr SNPpy - Database Management for SNP Data from Genome Wide Association Studies
title_full_unstemmed SNPpy - Database Management for SNP Data from Genome Wide Association Studies
title_short SNPpy - Database Management for SNP Data from Genome Wide Association Studies
title_sort snppy - database management for snp data from genome wide association studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3198468/
https://www.ncbi.nlm.nih.gov/pubmed/22039405
http://dx.doi.org/10.1371/journal.pone.0024982
work_keys_str_mv AT mithafaheem snppydatabasemanagementforsnpdatafromgenomewideassociationstudies
AT herodotouherodotos snppydatabasemanagementforsnpdatafromgenomewideassociationstudies
AT borisovnedyalko snppydatabasemanagementforsnpdatafromgenomewideassociationstudies
AT jiangchen snppydatabasemanagementforsnpdatafromgenomewideassociationstudies
AT yoderjosh snppydatabasemanagementforsnpdatafromgenomewideassociationstudies
AT owzarkouros snppydatabasemanagementforsnpdatafromgenomewideassociationstudies