Cargando…

ReSeq simulates realistic Illumina high-throughput sequencing data

In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, wh...

Descripción completa

Detalles Bibliográficos
Autores principales: Schmeing, Stephan, Robinson, Mark D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7896392/
https://www.ncbi.nlm.nih.gov/pubmed/33608040
http://dx.doi.org/10.1186/s13059-021-02265-7
Descripción
Sumario:In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-021-02265-7).