Cargando…

Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures

RNA-seq is a modular experimental and computational approach aiming in identifying and quantifying RNA molecules. The modularity of the RNA-seq technology enables adaptation of the protocol to develop new ways to explore RNA biology, but this modularity also brings forth the importance of methodolog...

Descripción completa

Detalles Bibliográficos
Autores principales: Simoneau, Joël, Gosselin, Ryan, Scott, Michelle S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671328/
https://www.ncbi.nlm.nih.gov/pubmed/33575596
http://dx.doi.org/10.1093/nargab/lqaa043
_version_ 1783610909013311488
author Simoneau, Joël
Gosselin, Ryan
Scott, Michelle S
author_facet Simoneau, Joël
Gosselin, Ryan
Scott, Michelle S
author_sort Simoneau, Joël
collection PubMed
description RNA-seq is a modular experimental and computational approach aiming in identifying and quantifying RNA molecules. The modularity of the RNA-seq technology enables adaptation of the protocol to develop new ways to explore RNA biology, but this modularity also brings forth the importance of methodological thoroughness. Liberty of approach comes with the responsibility of choices, and such choices must be informed. Here, we present an approach that identifies gene group-specific quantification biases in current RNA-seq software and references by processing datasets using diverse RNA-seq computational pipelines, and by decomposing these expression datasets with an independent component analysis matrix factorization method. By exploring the RNA-seq pipeline using this systemic approach, we identify genome annotations as a design choice that affects to the same extent quantification results as does the choice of aligners and quantifiers. We also show that the different choices in RNA-seq methodology are not independent, identifying interactions between genome annotations and quantification software. Genes were mainly affected by differences in their sequence, by overlapping genes and genes with similar sequence. Our approach offers an explanation for the observed biases by identifying the common features used differently by the software and references, therefore providing leads for the betterment of RNA-seq methodology.
format Online
Article
Text
id pubmed-7671328
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76713282021-02-10 Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures Simoneau, Joël Gosselin, Ryan Scott, Michelle S NAR Genom Bioinform Standard Article RNA-seq is a modular experimental and computational approach aiming in identifying and quantifying RNA molecules. The modularity of the RNA-seq technology enables adaptation of the protocol to develop new ways to explore RNA biology, but this modularity also brings forth the importance of methodological thoroughness. Liberty of approach comes with the responsibility of choices, and such choices must be informed. Here, we present an approach that identifies gene group-specific quantification biases in current RNA-seq software and references by processing datasets using diverse RNA-seq computational pipelines, and by decomposing these expression datasets with an independent component analysis matrix factorization method. By exploring the RNA-seq pipeline using this systemic approach, we identify genome annotations as a design choice that affects to the same extent quantification results as does the choice of aligners and quantifiers. We also show that the different choices in RNA-seq methodology are not independent, identifying interactions between genome annotations and quantification software. Genes were mainly affected by differences in their sequence, by overlapping genes and genes with similar sequence. Our approach offers an explanation for the observed biases by identifying the common features used differently by the software and references, therefore providing leads for the betterment of RNA-seq methodology. Oxford University Press 2020-06-29 /pmc/articles/PMC7671328/ /pubmed/33575596 http://dx.doi.org/10.1093/nargab/lqaa043 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Simoneau, Joël
Gosselin, Ryan
Scott, Michelle S
Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures
title Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures
title_full Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures
title_fullStr Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures
title_full_unstemmed Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures
title_short Factorial study of the RNA-seq computational workflow identifies biases as technical gene signatures
title_sort factorial study of the rna-seq computational workflow identifies biases as technical gene signatures
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671328/
https://www.ncbi.nlm.nih.gov/pubmed/33575596
http://dx.doi.org/10.1093/nargab/lqaa043
work_keys_str_mv AT simoneaujoel factorialstudyofthernaseqcomputationalworkflowidentifiesbiasesastechnicalgenesignatures
AT gosselinryan factorialstudyofthernaseqcomputationalworkflowidentifiesbiasesastechnicalgenesignatures
AT scottmichelles factorialstudyofthernaseqcomputationalworkflowidentifiesbiasesastechnicalgenesignatures