Cargando…
Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim
Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to ∼500 base pairs and are thus approaching read lengths obtained from traditional Sange...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935434/ https://www.ncbi.nlm.nih.gov/pubmed/20823302 http://dx.doi.org/10.1093/bioinformatics/btq365 |
_version_ | 1782186402147991552 |
---|---|
author | Balzer, Susanne Malde, Ketil Lanzén, Anders Sharma, Animesh Jonassen, Inge |
author_facet | Balzer, Susanne Malde, Ketil Lanzén, Anders Sharma, Animesh Jonassen, Inge |
author_sort | Balzer, Susanne |
collection | PubMed |
description | Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to ∼500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. Results: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields. Availability: Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/ Contact: susanne.balzer@imr.no; ketil.malde@imr.no |
format | Text |
id | pubmed-2935434 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-29354342010-09-08 Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim Balzer, Susanne Malde, Ketil Lanzén, Anders Sharma, Animesh Jonassen, Inge Bioinformatics Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium Motivation: The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Throughout the three available releases, average read lengths have increased to ∼500 base pairs and are thus approaching read lengths obtained from traditional Sanger sequencing. Study design of sequencing projects would benefit from being able to simulate experiments. Results: We explore 454 raw data to investigate its characteristics and derive empirical distributions for the flow values generated by pyrosequencing. Based on our findings, we implement Flowsim, a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences. We finally use our simulator to examine the impact of sequence lengths on the results of concrete whole-genome assemblies, and we suggest its use in planning of sequencing projects, benchmarking of assembly methods and other fields. Availability: Flowsim is freely available under the General Public License from http://blog.malde.org/index.php/flowsim/ Contact: susanne.balzer@imr.no; ketil.malde@imr.no Oxford University Press 2010-09-15 2010-09-04 /pmc/articles/PMC2935434/ /pubmed/20823302 http://dx.doi.org/10.1093/bioinformatics/btq365 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium Balzer, Susanne Malde, Ketil Lanzén, Anders Sharma, Animesh Jonassen, Inge Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim |
title | Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim |
title_full | Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim |
title_fullStr | Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim |
title_full_unstemmed | Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim |
title_short | Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim |
title_sort | characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim |
topic | Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935434/ https://www.ncbi.nlm.nih.gov/pubmed/20823302 http://dx.doi.org/10.1093/bioinformatics/btq365 |
work_keys_str_mv | AT balzersusanne characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim AT maldeketil characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim AT lanzenanders characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim AT sharmaanimesh characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim AT jonasseninge characteristicsof454pyrosequencingdataenablingrealisticsimulationwithflowsim |