Cargando…

Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes

BACKGROUND: Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mini...

Descripción completa

Detalles Bibliográficos
Autores principales:	Amigo, Jorge, Phillips, Christopher, Salas, Antonio, Carracedo, Ángel
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2665053/ https://www.ncbi.nlm.nih.gov/pubmed/19344481 http://dx.doi.org/10.1186/1471-2105-10-S3-S5

_version_	1782166016268173312
author	Amigo, Jorge Phillips, Christopher Salas, Antonio Carracedo, Ángel
author_facet	Amigo, Jorge Phillips, Christopher Salas, Antonio Carracedo, Ángel
author_sort	Amigo, Jorge
collection	PubMed
description	BACKGROUND: Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. RESULTS: To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. CONCLUSION: The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest.
format	Text
id	pubmed-2665053
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26650532009-04-04 Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes Amigo, Jorge Phillips, Christopher Salas, Antonio Carracedo, Ángel BMC Bioinformatics Proceedings BACKGROUND: Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. RESULTS: To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. CONCLUSION: The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest. BioMed Central 2009-03-19 /pmc/articles/PMC2665053/ /pubmed/19344481 http://dx.doi.org/10.1186/1471-2105-10-S3-S5 Text en Copyright © 2009 Amigo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Amigo, Jorge Phillips, Christopher Salas, Antonio Carracedo, Ángel Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes
title	Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes
title_full	Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes
title_fullStr	Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes
title_full_unstemmed	Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes
title_short	Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes
title_sort	viability of in-house datamarting approaches for population genetics analysis of snp genotypes
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2665053/ https://www.ncbi.nlm.nih.gov/pubmed/19344481 http://dx.doi.org/10.1186/1471-2105-10-S3-S5
work_keys_str_mv	AT amigojorge viabilityofinhousedatamartingapproachesforpopulationgeneticsanalysisofsnpgenotypes AT phillipschristopher viabilityofinhousedatamartingapproachesforpopulationgeneticsanalysisofsnpgenotypes AT salasantonio viabilityofinhousedatamartingapproachesforpopulationgeneticsanalysisofsnpgenotypes AT carracedoangel viabilityofinhousedatamartingapproachesforpopulationgeneticsanalysisofsnpgenotypes

Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes

Ejemplares similares