Cargando…

SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access

BACKGROUND: In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genet...

Descripción completa

Detalles Bibliográficos
Autores principales: Amigo, Jorge, Salas, Antonio, Phillips, Christopher, Carracedo, Ángel
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2576268/
https://www.ncbi.nlm.nih.gov/pubmed/18847484
http://dx.doi.org/10.1186/1471-2105-9-428
_version_ 1782160381545807872
author Amigo, Jorge
Salas, Antonio
Phillips, Christopher
Carracedo, Ángel
author_facet Amigo, Jorge
Salas, Antonio
Phillips, Christopher
Carracedo, Ángel
author_sort Amigo, Jorge
collection PubMed
description BACKGROUND: In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics. RESULTS: We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 × 10(9 )genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested. CONCLUSION: In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In.
format Text
id pubmed-2576268
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25762682008-10-31 SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access Amigo, Jorge Salas, Antonio Phillips, Christopher Carracedo, Ángel BMC Bioinformatics Software BACKGROUND: In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics. RESULTS: We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 × 10(9 )genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested. CONCLUSION: In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In. BioMed Central 2008-10-10 /pmc/articles/PMC2576268/ /pubmed/18847484 http://dx.doi.org/10.1186/1471-2105-9-428 Text en Copyright © 2008 Amigo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Amigo, Jorge
Salas, Antonio
Phillips, Christopher
Carracedo, Ángel
SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_full SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_fullStr SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_full_unstemmed SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_short SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
title_sort spsmart: adapting population based snp genotype databases for fast and comprehensive web access
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2576268/
https://www.ncbi.nlm.nih.gov/pubmed/18847484
http://dx.doi.org/10.1186/1471-2105-9-428
work_keys_str_mv AT amigojorge spsmartadaptingpopulationbasedsnpgenotypedatabasesforfastandcomprehensivewebaccess
AT salasantonio spsmartadaptingpopulationbasedsnpgenotypedatabasesforfastandcomprehensivewebaccess
AT phillipschristopher spsmartadaptingpopulationbasedsnpgenotypedatabasesforfastandcomprehensivewebaccess
AT carracedoangel spsmartadaptingpopulationbasedsnpgenotypedatabasesforfastandcomprehensivewebaccess