Cargando…

SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimite...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Yuxiang, Tambouret, Yann, Monti, Stefano
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4709598/
https://www.ncbi.nlm.nih.gov/pubmed/26839886
http://dx.doi.org/10.1155/2015/780519
_version_ 1782409670254657536
author Tan, Yuxiang
Tambouret, Yann
Monti, Stefano
author_facet Tan, Yuxiang
Tambouret, Yann
Monti, Stefano
author_sort Tan, Yuxiang
collection PubMed
description The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions' background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions' supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.
format Online
Article
Text
id pubmed-4709598
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-47095982016-02-02 SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data Tan, Yuxiang Tambouret, Yann Monti, Stefano Biomed Res Int Research Article The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions' background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions' supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated. Hindawi Publishing Corporation 2015 2015-12-29 /pmc/articles/PMC4709598/ /pubmed/26839886 http://dx.doi.org/10.1155/2015/780519 Text en Copyright © 2015 Yuxiang Tan et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Tan, Yuxiang
Tambouret, Yann
Monti, Stefano
SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data
title SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data
title_full SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data
title_fullStr SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data
title_full_unstemmed SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data
title_short SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data
title_sort simfuse: a novel fusion simulator for rna sequencing (rna-seq) data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4709598/
https://www.ncbi.nlm.nih.gov/pubmed/26839886
http://dx.doi.org/10.1155/2015/780519
work_keys_str_mv AT tanyuxiang simfuseanovelfusionsimulatorforrnasequencingrnaseqdata
AT tambouretyann simfuseanovelfusionsimulatorforrnasequencingrnaseqdata
AT montistefano simfuseanovelfusionsimulatorforrnasequencingrnaseqdata