Cargando…
MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts
BACKGROUND: Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from d...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8760670/ https://www.ncbi.nlm.nih.gov/pubmed/35030988 http://dx.doi.org/10.1186/s12859-021-04544-3 |
_version_ | 1784633371602714624 |
---|---|
author | Hita, Andrea Brocart, Gilles Fernandez, Ana Rehmsmeier, Marc Alemany, Anna Schvartzman, Sol |
author_facet | Hita, Andrea Brocart, Gilles Fernandez, Ana Rehmsmeier, Marc Alemany, Anna Schvartzman, Sol |
author_sort | Hita, Andrea |
collection | PubMed |
description | BACKGROUND: Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. RESULTS: Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. CONCLUSIONS: MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04544-3. |
format | Online Article Text |
id | pubmed-8760670 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-87606702022-01-18 MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts Hita, Andrea Brocart, Gilles Fernandez, Ana Rehmsmeier, Marc Alemany, Anna Schvartzman, Sol BMC Bioinformatics Software BACKGROUND: Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. RESULTS: Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. CONCLUSIONS: MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04544-3. BioMed Central 2022-01-14 /pmc/articles/PMC8760670/ /pubmed/35030988 http://dx.doi.org/10.1186/s12859-021-04544-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Hita, Andrea Brocart, Gilles Fernandez, Ana Rehmsmeier, Marc Alemany, Anna Schvartzman, Sol MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts |
title | MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts |
title_full | MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts |
title_fullStr | MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts |
title_full_unstemmed | MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts |
title_short | MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts |
title_sort | mgcount: a total rna-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8760670/ https://www.ncbi.nlm.nih.gov/pubmed/35030988 http://dx.doi.org/10.1186/s12859-021-04544-3 |
work_keys_str_mv | AT hitaandrea mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts AT brocartgilles mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts AT fernandezana mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts AT rehmsmeiermarc mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts AT alemanyanna mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts AT schvartzmansol mgcountatotalrnaseqquantificationtooltoaddressmultimappingandmultioverlappingalignmentsambiguityinnoncodingtranscripts |