Cargando…

Data-based filtering for replicated high-throughput transcriptome sequencing experiments

Motivation: RNA sequencing is now widely performed to study differential expression among experimental conditions. As tests are performed on a large number of genes, stringent false-discovery rate control is required at the expense of detection power. Ad hoc filtering techniques are regularly used t...

Descripción completa

Detalles Bibliográficos
Autores principales: Rau, Andrea, Gallopin, Mélina, Celeux, Gilles, Jaffrézic, Florence
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3740625/
https://www.ncbi.nlm.nih.gov/pubmed/23821648
http://dx.doi.org/10.1093/bioinformatics/btt350
_version_ 1782280159934545920
author Rau, Andrea
Gallopin, Mélina
Celeux, Gilles
Jaffrézic, Florence
author_facet Rau, Andrea
Gallopin, Mélina
Celeux, Gilles
Jaffrézic, Florence
author_sort Rau, Andrea
collection PubMed
description Motivation: RNA sequencing is now widely performed to study differential expression among experimental conditions. As tests are performed on a large number of genes, stringent false-discovery rate control is required at the expense of detection power. Ad hoc filtering techniques are regularly used to moderate this correction by removing genes with low signal, with little attention paid to their impact on downstream analyses. Results: We propose a data-driven method based on the Jaccard similarity index to calculate a filtering threshold for replicated RNA sequencing data. In comparisons with alternative data filters regularly used in practice, we demonstrate the effectiveness of our proposed method to correctly filter lowly expressed genes, leading to increased detection power for moderately to highly expressed genes. Interestingly, this data-driven threshold varies among experiments, highlighting the interest of the method proposed here. Availability: The proposed filtering method is implemented in the R package HTSFilter available on Bioconductor. Contact: andrea.rau@jouy.inra.fr Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3740625
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37406252013-08-13 Data-based filtering for replicated high-throughput transcriptome sequencing experiments Rau, Andrea Gallopin, Mélina Celeux, Gilles Jaffrézic, Florence Bioinformatics Original Papers Motivation: RNA sequencing is now widely performed to study differential expression among experimental conditions. As tests are performed on a large number of genes, stringent false-discovery rate control is required at the expense of detection power. Ad hoc filtering techniques are regularly used to moderate this correction by removing genes with low signal, with little attention paid to their impact on downstream analyses. Results: We propose a data-driven method based on the Jaccard similarity index to calculate a filtering threshold for replicated RNA sequencing data. In comparisons with alternative data filters regularly used in practice, we demonstrate the effectiveness of our proposed method to correctly filter lowly expressed genes, leading to increased detection power for moderately to highly expressed genes. Interestingly, this data-driven threshold varies among experiments, highlighting the interest of the method proposed here. Availability: The proposed filtering method is implemented in the R package HTSFilter available on Bioconductor. Contact: andrea.rau@jouy.inra.fr Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2013-09-01 2013-07-02 /pmc/articles/PMC3740625/ /pubmed/23821648 http://dx.doi.org/10.1093/bioinformatics/btt350 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Rau, Andrea
Gallopin, Mélina
Celeux, Gilles
Jaffrézic, Florence
Data-based filtering for replicated high-throughput transcriptome sequencing experiments
title Data-based filtering for replicated high-throughput transcriptome sequencing experiments
title_full Data-based filtering for replicated high-throughput transcriptome sequencing experiments
title_fullStr Data-based filtering for replicated high-throughput transcriptome sequencing experiments
title_full_unstemmed Data-based filtering for replicated high-throughput transcriptome sequencing experiments
title_short Data-based filtering for replicated high-throughput transcriptome sequencing experiments
title_sort data-based filtering for replicated high-throughput transcriptome sequencing experiments
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3740625/
https://www.ncbi.nlm.nih.gov/pubmed/23821648
http://dx.doi.org/10.1093/bioinformatics/btt350
work_keys_str_mv AT rauandrea databasedfilteringforreplicatedhighthroughputtranscriptomesequencingexperiments
AT gallopinmelina databasedfilteringforreplicatedhighthroughputtranscriptomesequencingexperiments
AT celeuxgilles databasedfilteringforreplicatedhighthroughputtranscriptomesequencingexperiments
AT jaffrezicflorence databasedfilteringforreplicatedhighthroughputtranscriptomesequencingexperiments