Cargando…

Exact transcript quantification over splice graphs

BACKGROUND: The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (Bioinformatics 3...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Cong, Zheng, Hongyu, Kingsford, Carl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112020/
https://www.ncbi.nlm.nih.gov/pubmed/33971903
http://dx.doi.org/10.1186/s13015-021-00184-7
_version_ 1783690612903510016
author Ma, Cong
Zheng, Hongyu
Kingsford, Carl
author_facet Ma, Cong
Zheng, Hongyu
Kingsford, Carl
author_sort Ma, Cong
collection PubMed
description BACKGROUND: The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (Bioinformatics 30:2447–55, 2014). The model can be viewed as a generalization of transcript expression quantification where every full path in the splice graph is a possible transcript. However, the previous graph quantification model assumes the length of single-end reads or paired-end fragments is fixed. RESULTS: We provide an improvement of this model to handle variable-length reads or fragments and incorporate bias correction. We prove that our model is equivalent to running a transcript quantifier with exactly the set of all compatible transcripts. The key to our method is constructing an extension of the splice graph based on Aho-Corasick automata. The proof of equivalence is based on a novel reparameterization of the read generation model of a state-of-art transcript quantification method. CONCLUSION: We propose a new approach for graph quantification, which is useful for modeling scenarios where reference transcriptome is incomplete or not available and can be further used in transcriptome assembly or alternative splicing analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-021-00184-7.
format Online
Article
Text
id pubmed-8112020
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81120202021-05-12 Exact transcript quantification over splice graphs Ma, Cong Zheng, Hongyu Kingsford, Carl Algorithms Mol Biol Research BACKGROUND: The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model graph quantification, which was first proposed by Bernard et al. (Bioinformatics 30:2447–55, 2014). The model can be viewed as a generalization of transcript expression quantification where every full path in the splice graph is a possible transcript. However, the previous graph quantification model assumes the length of single-end reads or paired-end fragments is fixed. RESULTS: We provide an improvement of this model to handle variable-length reads or fragments and incorporate bias correction. We prove that our model is equivalent to running a transcript quantifier with exactly the set of all compatible transcripts. The key to our method is constructing an extension of the splice graph based on Aho-Corasick automata. The proof of equivalence is based on a novel reparameterization of the read generation model of a state-of-art transcript quantification method. CONCLUSION: We propose a new approach for graph quantification, which is useful for modeling scenarios where reference transcriptome is incomplete or not available and can be further used in transcriptome assembly or alternative splicing analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13015-021-00184-7. BioMed Central 2021-05-10 /pmc/articles/PMC8112020/ /pubmed/33971903 http://dx.doi.org/10.1186/s13015-021-00184-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ma, Cong
Zheng, Hongyu
Kingsford, Carl
Exact transcript quantification over splice graphs
title Exact transcript quantification over splice graphs
title_full Exact transcript quantification over splice graphs
title_fullStr Exact transcript quantification over splice graphs
title_full_unstemmed Exact transcript quantification over splice graphs
title_short Exact transcript quantification over splice graphs
title_sort exact transcript quantification over splice graphs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8112020/
https://www.ncbi.nlm.nih.gov/pubmed/33971903
http://dx.doi.org/10.1186/s13015-021-00184-7
work_keys_str_mv AT macong exacttranscriptquantificationoversplicegraphs
AT zhenghongyu exacttranscriptquantificationoversplicegraphs
AT kingsfordcarl exacttranscriptquantificationoversplicegraphs