Cargando…

ENGINES: exploring single nucleotide variation in entire human genomes

BACKGROUND: Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw...

Descripción completa

Detalles Bibliográficos
Autores principales: Amigo, Jorge, Salas, Antonio, Phillips, Christopher
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3107182/
https://www.ncbi.nlm.nih.gov/pubmed/21504571
http://dx.doi.org/10.1186/1471-2105-12-105
_version_ 1782205201860526080
author Amigo, Jorge
Salas, Antonio
Phillips, Christopher
author_facet Amigo, Jorge
Salas, Antonio
Phillips, Christopher
author_sort Amigo, Jorge
collection PubMed
description BACKGROUND: Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data. DESCRIPTION: We have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs), population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs) uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs), as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and F(ST )filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP) repositories such as HapMap or Perlegen. CONCLUSIONS: ENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or F(ST )values for genetic differentiation. Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php
format Online
Article
Text
id pubmed-3107182
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31071822011-06-03 ENGINES: exploring single nucleotide variation in entire human genomes Amigo, Jorge Salas, Antonio Phillips, Christopher BMC Bioinformatics Database BACKGROUND: Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data. DESCRIPTION: We have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs), population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs) uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs), as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and F(ST )filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP) repositories such as HapMap or Perlegen. CONCLUSIONS: ENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or F(ST )values for genetic differentiation. Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php BioMed Central 2011-04-19 /pmc/articles/PMC3107182/ /pubmed/21504571 http://dx.doi.org/10.1186/1471-2105-12-105 Text en Copyright ©2011 Amigo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database
Amigo, Jorge
Salas, Antonio
Phillips, Christopher
ENGINES: exploring single nucleotide variation in entire human genomes
title ENGINES: exploring single nucleotide variation in entire human genomes
title_full ENGINES: exploring single nucleotide variation in entire human genomes
title_fullStr ENGINES: exploring single nucleotide variation in entire human genomes
title_full_unstemmed ENGINES: exploring single nucleotide variation in entire human genomes
title_short ENGINES: exploring single nucleotide variation in entire human genomes
title_sort engines: exploring single nucleotide variation in entire human genomes
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3107182/
https://www.ncbi.nlm.nih.gov/pubmed/21504571
http://dx.doi.org/10.1186/1471-2105-12-105
work_keys_str_mv AT amigojorge enginesexploringsinglenucleotidevariationinentirehumangenomes
AT salasantonio enginesexploringsinglenucleotidevariationinentirehumangenomes
AT phillipschristopher enginesexploringsinglenucleotidevariationinentirehumangenomes