Cargando…

FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses

BACKGROUND: High-throughput sequencing has become one of the primary tools for investigation of the molecular basis of disease. The increasing use of sequencing in investigations that aim to understand both individuals and populations is challenging our ability to develop analysis tools that scale w...

Descripción completa

Detalles Bibliográficos
Autores principales:	Killcoyne, Sarah, del Sol, Antonio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4039316/ https://www.ncbi.nlm.nih.gov/pubmed/24885193 http://dx.doi.org/10.1186/1471-2105-15-149

_version_	1782318468469620736
author	Killcoyne, Sarah del Sol, Antonio
author_facet	Killcoyne, Sarah del Sol, Antonio
author_sort	Killcoyne, Sarah
collection	PubMed
description	BACKGROUND: High-throughput sequencing has become one of the primary tools for investigation of the molecular basis of disease. The increasing use of sequencing in investigations that aim to understand both individuals and populations is challenging our ability to develop analysis tools that scale with the data. This issue is of particular concern in studies that exhibit a wide degree of heterogeneity or deviation from the standard reference genome. The advent of population scale sequencing studies requires analysis tools that are developed and tested against matching quantities of heterogeneous data. RESULTS: We developed a large-scale whole genome simulation tool, FIGG, which generates large numbers of whole genomes with known sequence characteristics based on direct sampling of experimentally known or theorized variations. For normal variations we used publicly available data to determine the frequency of different mutation classes across the genome. FIGG then uses this information as a background to generate new sequences from a parent sequence with matching frequencies, but different actual mutations. The background can be normal variations, known disease variations, or a theoretical frequency distribution of variations. CONCLUSION: In order to enable the creation of large numbers of genomes, FIGG generates simulated sequences from known genomic variation and iteratively mutates each genome separately. The result is multiple whole genome sequences with unique variations that can primarily be used to provide different reference genomes, model heterogeneous populations, and can offer a standard test environment for new analysis algorithms or bioinformatics tools.
format	Online Article Text
id	pubmed-4039316
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40393162014-05-31 FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses Killcoyne, Sarah del Sol, Antonio BMC Bioinformatics Software BACKGROUND: High-throughput sequencing has become one of the primary tools for investigation of the molecular basis of disease. The increasing use of sequencing in investigations that aim to understand both individuals and populations is challenging our ability to develop analysis tools that scale with the data. This issue is of particular concern in studies that exhibit a wide degree of heterogeneity or deviation from the standard reference genome. The advent of population scale sequencing studies requires analysis tools that are developed and tested against matching quantities of heterogeneous data. RESULTS: We developed a large-scale whole genome simulation tool, FIGG, which generates large numbers of whole genomes with known sequence characteristics based on direct sampling of experimentally known or theorized variations. For normal variations we used publicly available data to determine the frequency of different mutation classes across the genome. FIGG then uses this information as a background to generate new sequences from a parent sequence with matching frequencies, but different actual mutations. The background can be normal variations, known disease variations, or a theoretical frequency distribution of variations. CONCLUSION: In order to enable the creation of large numbers of genomes, FIGG generates simulated sequences from known genomic variation and iteratively mutates each genome separately. The result is multiple whole genome sequences with unique variations that can primarily be used to provide different reference genomes, model heterogeneous populations, and can offer a standard test environment for new analysis algorithms or bioinformatics tools. BioMed Central 2014-05-19 /pmc/articles/PMC4039316/ /pubmed/24885193 http://dx.doi.org/10.1186/1471-2105-15-149 Text en Copyright © 2014 Killcoyne and del Sol; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle	Software Killcoyne, Sarah del Sol, Antonio FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses
title	FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses
title_full	FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses
title_fullStr	FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses
title_full_unstemmed	FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses
title_short	FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses
title_sort	figg: simulating populations of whole genome sequences for heterogeneous data analyses
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4039316/ https://www.ncbi.nlm.nih.gov/pubmed/24885193 http://dx.doi.org/10.1186/1471-2105-15-149
work_keys_str_mv	AT killcoynesarah figgsimulatingpopulationsofwholegenomesequencesforheterogeneousdataanalyses AT delsolantonio figgsimulatingpopulationsofwholegenomesequencesforheterogeneousdataanalyses

FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses

Ejemplares similares