Cargando…

Transcript Assembly and Annotations: Bias and Adjustment

MOTIVATION. Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can l...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qimin, Shao, Mingfu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153229/
https://www.ncbi.nlm.nih.gov/pubmed/37131680
http://dx.doi.org/10.1101/2023.04.20.537700
_version_ 1785035892098859008
author Zhang, Qimin
Shao, Mingfu
author_facet Zhang, Qimin
Shao, Mingfu
author_sort Zhang, Qimin
collection PubMed
description MOTIVATION. Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. RESULTS. We investigate the impact of annotations on transcript assembly. We observe that conflicting conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.
format Online
Article
Text
id pubmed-10153229
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101532292023-05-03 Transcript Assembly and Annotations: Bias and Adjustment Zhang, Qimin Shao, Mingfu bioRxiv Article MOTIVATION. Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. RESULTS. We investigate the impact of annotations on transcript assembly. We observe that conflicting conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios. Cold Spring Harbor Laboratory 2023-04-21 /pmc/articles/PMC10153229/ /pubmed/37131680 http://dx.doi.org/10.1101/2023.04.20.537700 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Zhang, Qimin
Shao, Mingfu
Transcript Assembly and Annotations: Bias and Adjustment
title Transcript Assembly and Annotations: Bias and Adjustment
title_full Transcript Assembly and Annotations: Bias and Adjustment
title_fullStr Transcript Assembly and Annotations: Bias and Adjustment
title_full_unstemmed Transcript Assembly and Annotations: Bias and Adjustment
title_short Transcript Assembly and Annotations: Bias and Adjustment
title_sort transcript assembly and annotations: bias and adjustment
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153229/
https://www.ncbi.nlm.nih.gov/pubmed/37131680
http://dx.doi.org/10.1101/2023.04.20.537700
work_keys_str_mv AT zhangqimin transcriptassemblyandannotationsbiasandadjustment
AT shaomingfu transcriptassemblyandannotationsbiasandadjustment