Cargando…

PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores

MOTIVATION: Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream ana...

Descripción completa

Detalles Bibliográficos
Autores principales: Ono, Yukiteru, Asai, Kiyoshi, Hamada, Michiaki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8097687/
https://www.ncbi.nlm.nih.gov/pubmed/32976553
http://dx.doi.org/10.1093/bioinformatics/btaa835
_version_ 1783688369088233472
author Ono, Yukiteru
Asai, Kiyoshi
Hamada, Michiaki
author_facet Ono, Yukiteru
Asai, Kiyoshi
Hamada, Michiaki
author_sort Ono, Yukiteru
collection PubMed
description MOTIVATION: Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. RESULTS: To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. AVAILABILITY AND IMPLEMENTATION: The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8097687
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-80976872021-05-10 PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores Ono, Yukiteru Asai, Kiyoshi Hamada, Michiaki Bioinformatics Original Papers MOTIVATION: Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. RESULTS: To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. AVAILABILITY AND IMPLEMENTATION: The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-09-25 /pmc/articles/PMC8097687/ /pubmed/32976553 http://dx.doi.org/10.1093/bioinformatics/btaa835 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Ono, Yukiteru
Asai, Kiyoshi
Hamada, Michiaki
PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores
title PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores
title_full PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores
title_fullStr PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores
title_full_unstemmed PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores
title_short PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores
title_sort pbsim2: a simulator for long-read sequencers with a novel generative model of quality scores
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8097687/
https://www.ncbi.nlm.nih.gov/pubmed/32976553
http://dx.doi.org/10.1093/bioinformatics/btaa835
work_keys_str_mv AT onoyukiteru pbsim2asimulatorforlongreadsequencerswithanovelgenerativemodelofqualityscores
AT asaikiyoshi pbsim2asimulatorforlongreadsequencerswithanovelgenerativemodelofqualityscores
AT hamadamichiaki pbsim2asimulatorforlongreadsequencerswithanovelgenerativemodelofqualityscores