Cargando…

SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines

BACKGROUND: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in u...

Descripción completa

Detalles Bibliográficos
Autores principales: Audoux, Jérôme, Salson, Mikaël, Grosset, Christophe F., Beaumeunier, Sacha, Holder, Jean-Marc, Commes, Thérèse, Philippe, Nicolas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5623974/
https://www.ncbi.nlm.nih.gov/pubmed/28969586
http://dx.doi.org/10.1186/s12859-017-1831-5
_version_ 1783268184429690880
author Audoux, Jérôme
Salson, Mikaël
Grosset, Christophe F.
Beaumeunier, Sacha
Holder, Jean-Marc
Commes, Thérèse
Philippe, Nicolas
author_facet Audoux, Jérôme
Salson, Mikaël
Grosset, Christophe F.
Beaumeunier, Sacha
Holder, Jean-Marc
Commes, Thérèse
Philippe, Nicolas
author_sort Audoux, Jérôme
collection PubMed
description BACKGROUND: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. RESULTS: To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. CONCLUSION: Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1831-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5623974
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56239742017-10-12 SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines Audoux, Jérôme Salson, Mikaël Grosset, Christophe F. Beaumeunier, Sacha Holder, Jean-Marc Commes, Thérèse Philippe, Nicolas BMC Bioinformatics Software BACKGROUND: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. RESULTS: To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. CONCLUSION: Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1831-5) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-29 /pmc/articles/PMC5623974/ /pubmed/28969586 http://dx.doi.org/10.1186/s12859-017-1831-5 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Audoux, Jérôme
Salson, Mikaël
Grosset, Christophe F.
Beaumeunier, Sacha
Holder, Jean-Marc
Commes, Thérèse
Philippe, Nicolas
SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines
title SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines
title_full SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines
title_fullStr SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines
title_full_unstemmed SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines
title_short SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines
title_sort simba: a methodology and tools for evaluating the performance of rna-seq bioinformatic pipelines
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5623974/
https://www.ncbi.nlm.nih.gov/pubmed/28969586
http://dx.doi.org/10.1186/s12859-017-1831-5
work_keys_str_mv AT audouxjerome simbaamethodologyandtoolsforevaluatingtheperformanceofrnaseqbioinformaticpipelines
AT salsonmikael simbaamethodologyandtoolsforevaluatingtheperformanceofrnaseqbioinformaticpipelines
AT grossetchristophef simbaamethodologyandtoolsforevaluatingtheperformanceofrnaseqbioinformaticpipelines
AT beaumeuniersacha simbaamethodologyandtoolsforevaluatingtheperformanceofrnaseqbioinformaticpipelines
AT holderjeanmarc simbaamethodologyandtoolsforevaluatingtheperformanceofrnaseqbioinformaticpipelines
AT commestherese simbaamethodologyandtoolsforevaluatingtheperformanceofrnaseqbioinformaticpipelines
AT philippenicolas simbaamethodologyandtoolsforevaluatingtheperformanceofrnaseqbioinformaticpipelines