Cargando…

ReSeq simulates realistic Illumina high-throughput sequencing data

In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, wh...

Descripción completa

Detalles Bibliográficos
Autores principales: Schmeing, Stephan, Robinson, Mark D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7896392/
https://www.ncbi.nlm.nih.gov/pubmed/33608040
http://dx.doi.org/10.1186/s13059-021-02265-7
_version_ 1783653533592059904
author Schmeing, Stephan
Robinson, Mark D.
author_facet Schmeing, Stephan
Robinson, Mark D.
author_sort Schmeing, Stephan
collection PubMed
description In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-021-02265-7).
format Online
Article
Text
id pubmed-7896392
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78963922021-02-22 ReSeq simulates realistic Illumina high-throughput sequencing data Schmeing, Stephan Robinson, Mark D. Genome Biol Method In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-021-02265-7). BioMed Central 2021-02-19 /pmc/articles/PMC7896392/ /pubmed/33608040 http://dx.doi.org/10.1186/s13059-021-02265-7 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Method
Schmeing, Stephan
Robinson, Mark D.
ReSeq simulates realistic Illumina high-throughput sequencing data
title ReSeq simulates realistic Illumina high-throughput sequencing data
title_full ReSeq simulates realistic Illumina high-throughput sequencing data
title_fullStr ReSeq simulates realistic Illumina high-throughput sequencing data
title_full_unstemmed ReSeq simulates realistic Illumina high-throughput sequencing data
title_short ReSeq simulates realistic Illumina high-throughput sequencing data
title_sort reseq simulates realistic illumina high-throughput sequencing data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7896392/
https://www.ncbi.nlm.nih.gov/pubmed/33608040
http://dx.doi.org/10.1186/s13059-021-02265-7
work_keys_str_mv AT schmeingstephan reseqsimulatesrealisticilluminahighthroughputsequencingdata
AT robinsonmarkd reseqsimulatesrealisticilluminahighthroughputsequencingdata