Cargando…

PaSS: a sequencing simulator for PacBio sequencing

BACKGROUND: Third-generation sequencing platforms, such as PacBio sequencing, have been developed rapidly in recent years. PacBio sequencing generates much longer reads than the second-generation sequencing (or the next generation sequencing, NGS) technologies and it has unique sequencing error patt...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Wenmin, Jia, Ben, Wei, Chaochun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6588853/
https://www.ncbi.nlm.nih.gov/pubmed/31226925
http://dx.doi.org/10.1186/s12859-019-2901-7
_version_ 1783429290993385472
author Zhang, Wenmin
Jia, Ben
Wei, Chaochun
author_facet Zhang, Wenmin
Jia, Ben
Wei, Chaochun
author_sort Zhang, Wenmin
collection PubMed
description BACKGROUND: Third-generation sequencing platforms, such as PacBio sequencing, have been developed rapidly in recent years. PacBio sequencing generates much longer reads than the second-generation sequencing (or the next generation sequencing, NGS) technologies and it has unique sequencing error patterns. An effective read simulator is essential to evaluate and promote the development of new bioinformatics tools for PacBio sequencing data analysis. RESULTS: We developed a new PacBio Sequencing Simulator (PaSS). It can learn sequence patterns from PacBio sequencing data currently available. In addition to the distribution of read lengths and error rates, we included a context-specific sequencing error model. Compared to existing PacBio sequencing simulators such as PBSIM, LongISLND and NPBSS, PaSS performed better in many aspects. Assembly tests also suggest that reads simulated by PaSS are the most similar to experimental sequencing data. CONCLUSION: PaSS is an effective sequence simulator for PacBio sequencing. It will facilitate the evaluation and development of new analysis tools for the third-generation sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2901-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6588853
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65888532019-07-08 PaSS: a sequencing simulator for PacBio sequencing Zhang, Wenmin Jia, Ben Wei, Chaochun BMC Bioinformatics Software BACKGROUND: Third-generation sequencing platforms, such as PacBio sequencing, have been developed rapidly in recent years. PacBio sequencing generates much longer reads than the second-generation sequencing (or the next generation sequencing, NGS) technologies and it has unique sequencing error patterns. An effective read simulator is essential to evaluate and promote the development of new bioinformatics tools for PacBio sequencing data analysis. RESULTS: We developed a new PacBio Sequencing Simulator (PaSS). It can learn sequence patterns from PacBio sequencing data currently available. In addition to the distribution of read lengths and error rates, we included a context-specific sequencing error model. Compared to existing PacBio sequencing simulators such as PBSIM, LongISLND and NPBSS, PaSS performed better in many aspects. Assembly tests also suggest that reads simulated by PaSS are the most similar to experimental sequencing data. CONCLUSION: PaSS is an effective sequence simulator for PacBio sequencing. It will facilitate the evaluation and development of new analysis tools for the third-generation sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2901-7) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-21 /pmc/articles/PMC6588853/ /pubmed/31226925 http://dx.doi.org/10.1186/s12859-019-2901-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Zhang, Wenmin
Jia, Ben
Wei, Chaochun
PaSS: a sequencing simulator for PacBio sequencing
title PaSS: a sequencing simulator for PacBio sequencing
title_full PaSS: a sequencing simulator for PacBio sequencing
title_fullStr PaSS: a sequencing simulator for PacBio sequencing
title_full_unstemmed PaSS: a sequencing simulator for PacBio sequencing
title_short PaSS: a sequencing simulator for PacBio sequencing
title_sort pass: a sequencing simulator for pacbio sequencing
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6588853/
https://www.ncbi.nlm.nih.gov/pubmed/31226925
http://dx.doi.org/10.1186/s12859-019-2901-7
work_keys_str_mv AT zhangwenmin passasequencingsimulatorforpacbiosequencing
AT jiaben passasequencingsimulatorforpacbiosequencing
AT weichaochun passasequencingsimulatorforpacbiosequencing