Cargando…

Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level

RNA sequencing (RNA-seq) is currently the standard method for genome-wide expression profiling. RNA-seq reads often need to be mapped to a reference genome before read counts can be produced for genes. Read trimming methods have been developed to assist read mapping by removing adapter sequences and...

Descripción completa

Detalles Bibliográficos
Autores principales: Liao, Yang, Shi, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671312/
https://www.ncbi.nlm.nih.gov/pubmed/33575617
http://dx.doi.org/10.1093/nargab/lqaa068
_version_ 1783610906387677184
author Liao, Yang
Shi, Wei
author_facet Liao, Yang
Shi, Wei
author_sort Liao, Yang
collection PubMed
description RNA sequencing (RNA-seq) is currently the standard method for genome-wide expression profiling. RNA-seq reads often need to be mapped to a reference genome before read counts can be produced for genes. Read trimming methods have been developed to assist read mapping by removing adapter sequences and low-sequencing-quality bases. It is however unclear what is the impact of read trimming on the quantification of RNA-seq data, an important task in RNA-seq data analysis. In this study, we used a benchmark RNA-seq dataset and simulation data to assess the impact of read trimming on mapping and quantification of RNA-seq reads. We found that adapter sequences can be effectively removed by read aligner via ’soft-clipping’ and that many low-sequencing-quality bases, which would be removed by read trimming tools, were rescued by the aligner. Accuracy of gene expression quantification from using untrimmed reads was found to be comparable to or slightly better than that from using trimmed reads, based on Pearson correlation with reverse transcriptase-polymerase chain reaction data and simulation truth. Total data analysis time was reduced by up to an order of magnitude when read trimming was not performed. Our study suggests that read trimming is a redundant process in the quantification of RNA-seq expression data.
format Online
Article
Text
id pubmed-7671312
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76713122021-02-10 Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level Liao, Yang Shi, Wei NAR Genom Bioinform Application Notes RNA sequencing (RNA-seq) is currently the standard method for genome-wide expression profiling. RNA-seq reads often need to be mapped to a reference genome before read counts can be produced for genes. Read trimming methods have been developed to assist read mapping by removing adapter sequences and low-sequencing-quality bases. It is however unclear what is the impact of read trimming on the quantification of RNA-seq data, an important task in RNA-seq data analysis. In this study, we used a benchmark RNA-seq dataset and simulation data to assess the impact of read trimming on mapping and quantification of RNA-seq reads. We found that adapter sequences can be effectively removed by read aligner via ’soft-clipping’ and that many low-sequencing-quality bases, which would be removed by read trimming tools, were rescued by the aligner. Accuracy of gene expression quantification from using untrimmed reads was found to be comparable to or slightly better than that from using trimmed reads, based on Pearson correlation with reverse transcriptase-polymerase chain reaction data and simulation truth. Total data analysis time was reduced by up to an order of magnitude when read trimming was not performed. Our study suggests that read trimming is a redundant process in the quantification of RNA-seq expression data. Oxford University Press 2020-09-03 /pmc/articles/PMC7671312/ /pubmed/33575617 http://dx.doi.org/10.1093/nargab/lqaa068 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Application Notes
Liao, Yang
Shi, Wei
Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level
title Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level
title_full Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level
title_fullStr Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level
title_full_unstemmed Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level
title_short Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level
title_sort read trimming is not required for mapping and quantification of rna-seq reads at the gene level
topic Application Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671312/
https://www.ncbi.nlm.nih.gov/pubmed/33575617
http://dx.doi.org/10.1093/nargab/lqaa068
work_keys_str_mv AT liaoyang readtrimmingisnotrequiredformappingandquantificationofrnaseqreadsatthegenelevel
AT shiwei readtrimmingisnotrequiredformappingandquantificationofrnaseqreadsatthegenelevel