Cargando…

MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts

BACKGROUND: Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from d...

Descripción completa

Detalles Bibliográficos
Autores principales: Hita, Andrea, Brocart, Gilles, Fernandez, Ana, Rehmsmeier, Marc, Alemany, Anna, Schvartzman, Sol
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8760670/
https://www.ncbi.nlm.nih.gov/pubmed/35030988
http://dx.doi.org/10.1186/s12859-021-04544-3
_version_ 1784633371602714624
author Hita, Andrea
Brocart, Gilles
Fernandez, Ana
Rehmsmeier, Marc
Alemany, Anna
Schvartzman, Sol
author_facet Hita, Andrea
Brocart, Gilles
Fernandez, Ana
Rehmsmeier, Marc
Alemany, Anna
Schvartzman, Sol
author_sort Hita, Andrea
collection PubMed
description BACKGROUND: Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. RESULTS: Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. CONCLUSIONS: MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04544-3.
format Online
Article
Text
id pubmed-8760670
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-87606702022-01-18 MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts Hita, Andrea Brocart, Gilles Fernandez, Ana Rehmsmeier, Marc Alemany, Anna Schvartzman, Sol BMC Bioinformatics Software BACKGROUND: Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. RESULTS: Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. CONCLUSIONS: MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04544-3. BioMed Central 2022-01-14 /pmc/articles/PMC8760670/ /pubmed/35030988 http://dx.doi.org/10.1186/s12859-021-04544-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Hita, Andrea
Brocart, Gilles
Fernandez, Ana
Rehmsmeier, Marc
Alemany, Anna
Schvartzman, Sol
MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts
title MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts
title_full MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts
title_fullStr MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts
title_full_unstemmed MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts
title_short MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts
title_sort mgcount: a total rna-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8760670/
https://www.ncbi.nlm.nih.gov/pubmed/35030988
http://dx.doi.org/10.1186/s12859-021-04544-3
work_keys_str_mv AT hitaandrea mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts
AT brocartgilles mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts
AT fernandezana mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts
AT rehmsmeiermarc mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts
AT alemanyanna mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts
AT schvartzmansol mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts