Cargando…
SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data
The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimite...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4709598/ https://www.ncbi.nlm.nih.gov/pubmed/26839886 http://dx.doi.org/10.1155/2015/780519 |
_version_ | 1782409670254657536 |
---|---|
author | Tan, Yuxiang Tambouret, Yann Monti, Stefano |
author_facet | Tan, Yuxiang Tambouret, Yann Monti, Stefano |
author_sort | Tan, Yuxiang |
collection | PubMed |
description | The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions' background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions' supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated. |
format | Online Article Text |
id | pubmed-4709598 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-47095982016-02-02 SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data Tan, Yuxiang Tambouret, Yann Monti, Stefano Biomed Res Int Research Article The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions' background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions' supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated. Hindawi Publishing Corporation 2015 2015-12-29 /pmc/articles/PMC4709598/ /pubmed/26839886 http://dx.doi.org/10.1155/2015/780519 Text en Copyright © 2015 Yuxiang Tan et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Tan, Yuxiang Tambouret, Yann Monti, Stefano SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data |
title | SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data |
title_full | SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data |
title_fullStr | SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data |
title_full_unstemmed | SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data |
title_short | SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data |
title_sort | simfuse: a novel fusion simulator for rna sequencing (rna-seq) data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4709598/ https://www.ncbi.nlm.nih.gov/pubmed/26839886 http://dx.doi.org/10.1155/2015/780519 |
work_keys_str_mv | AT tanyuxiang simfuseanovelfusionsimulatorforrnasequencingrnaseqdata AT tambouretyann simfuseanovelfusionsimulatorforrnasequencingrnaseqdata AT montistefano simfuseanovelfusionsimulatorforrnasequencingrnaseqdata |