Cargando…

Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim

Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to ∼500 base pairs and are thus approaching read lengths obtained from traditional Sange...

Descripción completa

Detalles Bibliográficos
Autores principales: Balzer, Susanne, Malde, Ketil, Lanzén, Anders, Sharma, Animesh, Jonassen, Inge
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935434/
https://www.ncbi.nlm.nih.gov/pubmed/20823302
http://dx.doi.org/10.1093/bioinformatics/btq365
_version_ 1782186402147991552
author Balzer, Susanne
Malde, Ketil
Lanzén, Anders
Sharma, Animesh
Jonassen, Inge
author_facet Balzer, Susanne
Malde, Ketil
Lanzén, Anders
Sharma, Animesh
Jonassen, Inge
author_sort Balzer, Susanne
collection PubMed
description Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to ∼500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. Results: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields. Availability: Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/ Contact: susanne.balzer@imr.no; ketil.malde@imr.no
format Text
id pubmed-2935434
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29354342010-09-08 Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim Balzer, Susanne Malde, Ketil Lanzén, Anders Sharma, Animesh Jonassen, Inge Bioinformatics Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to ∼500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. Results: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields. Availability: Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/ Contact: susanne.balzer@imr.no; ketil.malde@imr.no Oxford University Press 2010-09-15 2010-09-04 /pmc/articles/PMC2935434/ /pubmed/20823302 http://dx.doi.org/10.1093/bioinformatics/btq365 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium
Balzer, Susanne
Malde, Ketil
Lanzén, Anders
Sharma, Animesh
Jonassen, Inge
Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim
title Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim
title_full Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim
title_fullStr Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim
title_full_unstemmed Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim
title_short Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim
title_sort characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim
topic Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935434/
https://www.ncbi.nlm.nih.gov/pubmed/20823302
http://dx.doi.org/10.1093/bioinformatics/btq365
work_keys_str_mv AT balzersusanne characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim
AT maldeketil characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim
AT lanzenanders characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim
AT sharmaanimesh characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim
AT jonasseninge characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim