Cargando…

NanoSim: nanopore sequence read simulator based on statistical characterization

Background: The MinION sequencing instrument from Oxford Nanopore Technologies (ONT) produces long read lengths from single-molecule sequencing – valuable features for detailed genome characterization. To realize the potential of this platform, a number of groups are developing bioinformatics tools...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Chen, Chu, Justin, Warren, René L, Birol, Inanç
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530317/
https://www.ncbi.nlm.nih.gov/pubmed/28327957
http://dx.doi.org/10.1093/gigascience/gix010
_version_ 1783253245274095616
author Yang, Chen
Chu, Justin
Warren, René L
Birol, Inanç
author_facet Yang, Chen
Chu, Justin
Warren, René L
Birol, Inanç
author_sort Yang, Chen
collection PubMed
description Background: The MinION sequencing instrument from Oxford Nanopore Technologies (ONT) produces long read lengths from single-molecule sequencing – valuable features for detailed genome characterization. To realize the potential of this platform, a number of groups are developing bioinformatics tools tuned for the unique characteristics of its data. We note that these development efforts would benefit from a simulator software, the output of which could be used to benchmark analysis tools. Results: Here, we introduce NanoSim, a fast and scalable read simulator that captures the technology-specific features of ONT data and allows for adjustments upon improvement of nanopore sequencing technology. The first step of NanoSim is read characterization, which provides a comprehensive alignment-based analysis and generates a set of read profiles serving as the input to the next step, the simulation stage. The simulation stage uses the model built in the previous step to produce in silico reads for a given reference genome. NanoSim is written in Python and R. The source files and manual are available at the Genome Sciences Centre website: http://www.bcgsc.ca/platform/bioinfo/software/nanosim. Conclusion: In this work, we model the base-calling errors of ONT reads to inform the simulation of sequences with similar characteristics. We showcase the performance of NanoSim on publicly available datasets generated using the R7 and R7.3 chemistries and different sequencing kits and compare the resulting synthetic reads to those of other long-sequence simulators and experimental ONT reads. We expect NanoSim to have an enabling role in the field and benefit the development of scalable next-generation sequencing technologies for the long nanopore reads, including genome assembly, mutation detection, and even metagenomic analysis software.
format Online
Article
Text
id pubmed-5530317
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-55303172017-07-31 NanoSim: nanopore sequence read simulator based on statistical characterization Yang, Chen Chu, Justin Warren, René L Birol, Inanç Gigascience Technical Note Background: The MinION sequencing instrument from Oxford Nanopore Technologies (ONT) produces long read lengths from single-molecule sequencing – valuable features for detailed genome characterization. To realize the potential of this platform, a number of groups are developing bioinformatics tools tuned for the unique characteristics of its data. We note that these development efforts would benefit from a simulator software, the output of which could be used to benchmark analysis tools. Results: Here, we introduce NanoSim, a fast and scalable read simulator that captures the technology-specific features of ONT data and allows for adjustments upon improvement of nanopore sequencing technology. The first step of NanoSim is read characterization, which provides a comprehensive alignment-based analysis and generates a set of read profiles serving as the input to the next step, the simulation stage. The simulation stage uses the model built in the previous step to produce in silico reads for a given reference genome. NanoSim is written in Python and R. The source files and manual are available at the Genome Sciences Centre website: http://www.bcgsc.ca/platform/bioinfo/software/nanosim. Conclusion: In this work, we model the base-calling errors of ONT reads to inform the simulation of sequences with similar characteristics. We showcase the performance of NanoSim on publicly available datasets generated using the R7 and R7.3 chemistries and different sequencing kits and compare the resulting synthetic reads to those of other long-sequence simulators and experimental ONT reads. We expect NanoSim to have an enabling role in the field and benefit the development of scalable next-generation sequencing technologies for the long nanopore reads, including genome assembly, mutation detection, and even metagenomic analysis software. Oxford University Press 2017-02-24 /pmc/articles/PMC5530317/ /pubmed/28327957 http://dx.doi.org/10.1093/gigascience/gix010 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Yang, Chen
Chu, Justin
Warren, René L
Birol, Inanç
NanoSim: nanopore sequence read simulator based on statistical characterization
title NanoSim: nanopore sequence read simulator based on statistical characterization
title_full NanoSim: nanopore sequence read simulator based on statistical characterization
title_fullStr NanoSim: nanopore sequence read simulator based on statistical characterization
title_full_unstemmed NanoSim: nanopore sequence read simulator based on statistical characterization
title_short NanoSim: nanopore sequence read simulator based on statistical characterization
title_sort nanosim: nanopore sequence read simulator based on statistical characterization
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530317/
https://www.ncbi.nlm.nih.gov/pubmed/28327957
http://dx.doi.org/10.1093/gigascience/gix010
work_keys_str_mv AT yangchen nanosimnanoporesequencereadsimulatorbasedonstatisticalcharacterization
AT chujustin nanosimnanoporesequencereadsimulatorbasedonstatisticalcharacterization
AT warrenrenel nanosimnanoporesequencereadsimulatorbasedonstatisticalcharacterization
AT birolinanc nanosimnanoporesequencereadsimulatorbasedonstatisticalcharacterization