Cargando…

Alignment and mapping methodology influence transcript abundance estimation

BACKGROUND: The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has bee...

Descripción completa

Detalles Bibliográficos
Autores principales: Srivastava, Avi, Malik, Laraib, Sarkar, Hirak, Zakeri, Mohsen, Almodaresi, Fatemeh, Soneson, Charlotte, Love, Michael I., Kingsford, Carl, Patro, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487471/
https://www.ncbi.nlm.nih.gov/pubmed/32894187
http://dx.doi.org/10.1186/s13059-020-02151-8
_version_ 1783581492571537408
author Srivastava, Avi
Malik, Laraib
Sarkar, Hirak
Zakeri, Mohsen
Almodaresi, Fatemeh
Soneson, Charlotte
Love, Michael I.
Kingsford, Carl
Patro, Rob
author_facet Srivastava, Avi
Malik, Laraib
Sarkar, Hirak
Zakeri, Mohsen
Almodaresi, Fatemeh
Soneson, Charlotte
Love, Michael I.
Kingsford, Carl
Patro, Rob
author_sort Srivastava, Avi
collection PubMed
description BACKGROUND: The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy. RESULTS: We investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment. CONCLUSION: We observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly, and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification.
format Online
Article
Text
id pubmed-7487471
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74874712020-09-15 Alignment and mapping methodology influence transcript abundance estimation Srivastava, Avi Malik, Laraib Sarkar, Hirak Zakeri, Mohsen Almodaresi, Fatemeh Soneson, Charlotte Love, Michael I. Kingsford, Carl Patro, Rob Genome Biol Research BACKGROUND: The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy. RESULTS: We investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment. CONCLUSION: We observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly, and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification. BioMed Central 2020-09-07 /pmc/articles/PMC7487471/ /pubmed/32894187 http://dx.doi.org/10.1186/s13059-020-02151-8 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Srivastava, Avi
Malik, Laraib
Sarkar, Hirak
Zakeri, Mohsen
Almodaresi, Fatemeh
Soneson, Charlotte
Love, Michael I.
Kingsford, Carl
Patro, Rob
Alignment and mapping methodology influence transcript abundance estimation
title Alignment and mapping methodology influence transcript abundance estimation
title_full Alignment and mapping methodology influence transcript abundance estimation
title_fullStr Alignment and mapping methodology influence transcript abundance estimation
title_full_unstemmed Alignment and mapping methodology influence transcript abundance estimation
title_short Alignment and mapping methodology influence transcript abundance estimation
title_sort alignment and mapping methodology influence transcript abundance estimation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7487471/
https://www.ncbi.nlm.nih.gov/pubmed/32894187
http://dx.doi.org/10.1186/s13059-020-02151-8
work_keys_str_mv AT srivastavaavi alignmentandmappingmethodologyinfluencetranscriptabundanceestimation
AT maliklaraib alignmentandmappingmethodologyinfluencetranscriptabundanceestimation
AT sarkarhirak alignmentandmappingmethodologyinfluencetranscriptabundanceestimation
AT zakerimohsen alignmentandmappingmethodologyinfluencetranscriptabundanceestimation
AT almodaresifatemeh alignmentandmappingmethodologyinfluencetranscriptabundanceestimation
AT sonesoncharlotte alignmentandmappingmethodologyinfluencetranscriptabundanceestimation
AT lovemichaeli alignmentandmappingmethodologyinfluencetranscriptabundanceestimation
AT kingsfordcarl alignmentandmappingmethodologyinfluencetranscriptabundanceestimation
AT patrorob alignmentandmappingmethodologyinfluencetranscriptabundanceestimation