Cargando…

A fuzzy method for RNA-Seq differential expression analysis in presence of multireads

BACKGROUND: When the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping...

Descripción completa

Detalles Bibliográficos
Autores principales: Consiglio, Arianna, Mencar, Corrado, Grillo, Giorgio, Marzano, Flaviana, Caratozzolo, Mariano Francesco, Liuni, Sabino
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123383/
https://www.ncbi.nlm.nih.gov/pubmed/28185579
http://dx.doi.org/10.1186/s12859-016-1195-2
_version_ 1782469725313302528
author Consiglio, Arianna
Mencar, Corrado
Grillo, Giorgio
Marzano, Flaviana
Caratozzolo, Mariano Francesco
Liuni, Sabino
author_facet Consiglio, Arianna
Mencar, Corrado
Grillo, Giorgio
Marzano, Flaviana
Caratozzolo, Mariano Francesco
Liuni, Sabino
author_sort Consiglio, Arianna
collection PubMed
description BACKGROUND: When the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping genes. Removing the multireads from the mapping results, in RNA-Seq analyses, causes an underestimation of the read counts, while estimating the real read count can lead to false positives during the detection of differentially expressed sequences. RESULTS: We present an innovative approach to deal with multireads and evaluate differential expression events, entirely based on fuzzy set theory. Since multireads cause uncertainty in the estimation of read counts during gene expression computation, they can also influence the reliability of differential expression analysis results, by producing false positives. Our method manages the uncertainty in gene expression estimation by defining the fuzzy read counts and evaluates the possibility of a gene to be differentially expressed with three fuzzy concepts: over-expression, same-expression and under-expression. The output of the method is a list of differentially expressed genes enriched with information about the uncertainty of the results due to the multiread presence. We have tested the method on RNA-Seq data designed for case-control studies and we have compared the obtained results with other existing tools for read count estimation and differential expression analysis. CONCLUSIONS: The management of multireads with the use of fuzzy sets allows to obtain a list of differential expression events which takes in account the uncertainty in the results caused by the presence of multireads. Such additional information can be used by the biologists when they have to select the most relevant differential expression events to validate with laboratory assays. Our method can be used to compute reliable differential expression events and to highlight possible false positives in the lists of differentially expressed genes computed with other tools. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1195-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5123383
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51233832016-12-08 A fuzzy method for RNA-Seq differential expression analysis in presence of multireads Consiglio, Arianna Mencar, Corrado Grillo, Giorgio Marzano, Flaviana Caratozzolo, Mariano Francesco Liuni, Sabino BMC Bioinformatics Research BACKGROUND: When the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping genes. Removing the multireads from the mapping results, in RNA-Seq analyses, causes an underestimation of the read counts, while estimating the real read count can lead to false positives during the detection of differentially expressed sequences. RESULTS: We present an innovative approach to deal with multireads and evaluate differential expression events, entirely based on fuzzy set theory. Since multireads cause uncertainty in the estimation of read counts during gene expression computation, they can also influence the reliability of differential expression analysis results, by producing false positives. Our method manages the uncertainty in gene expression estimation by defining the fuzzy read counts and evaluates the possibility of a gene to be differentially expressed with three fuzzy concepts: over-expression, same-expression and under-expression. The output of the method is a list of differentially expressed genes enriched with information about the uncertainty of the results due to the multiread presence. We have tested the method on RNA-Seq data designed for case-control studies and we have compared the obtained results with other existing tools for read count estimation and differential expression analysis. CONCLUSIONS: The management of multireads with the use of fuzzy sets allows to obtain a list of differential expression events which takes in account the uncertainty in the results caused by the presence of multireads. Such additional information can be used by the biologists when they have to select the most relevant differential expression events to validate with laboratory assays. Our method can be used to compute reliable differential expression events and to highlight possible false positives in the lists of differentially expressed genes computed with other tools. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1195-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-08 /pmc/articles/PMC5123383/ /pubmed/28185579 http://dx.doi.org/10.1186/s12859-016-1195-2 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Consiglio, Arianna
Mencar, Corrado
Grillo, Giorgio
Marzano, Flaviana
Caratozzolo, Mariano Francesco
Liuni, Sabino
A fuzzy method for RNA-Seq differential expression analysis in presence of multireads
title A fuzzy method for RNA-Seq differential expression analysis in presence of multireads
title_full A fuzzy method for RNA-Seq differential expression analysis in presence of multireads
title_fullStr A fuzzy method for RNA-Seq differential expression analysis in presence of multireads
title_full_unstemmed A fuzzy method for RNA-Seq differential expression analysis in presence of multireads
title_short A fuzzy method for RNA-Seq differential expression analysis in presence of multireads
title_sort fuzzy method for rna-seq differential expression analysis in presence of multireads
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123383/
https://www.ncbi.nlm.nih.gov/pubmed/28185579
http://dx.doi.org/10.1186/s12859-016-1195-2
work_keys_str_mv AT consiglioarianna afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT mencarcorrado afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT grillogiorgio afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT marzanoflaviana afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT caratozzolomarianofrancesco afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT liunisabino afuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT consiglioarianna fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT mencarcorrado fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT grillogiorgio fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT marzanoflaviana fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT caratozzolomarianofrancesco fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads
AT liunisabino fuzzymethodforrnaseqdifferentialexpressionanalysisinpresenceofmultireads