Cargando…
CoCo: RNA-seq read assignment correction for nested genes and multimapped reads
MOTIVATION: Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6901076/ https://www.ncbi.nlm.nih.gov/pubmed/31141144 http://dx.doi.org/10.1093/bioinformatics/btz433 |
_version_ | 1783477449675243520 |
---|---|
author | Deschamps-Francoeur, Gabrielle Boivin, Vincent Abou Elela, Sherif Scott, Michelle S |
author_facet | Deschamps-Francoeur, Gabrielle Boivin, Vincent Abou Elela, Sherif Scott, Michelle S |
author_sort | Deschamps-Francoeur, Gabrielle |
collection | PubMed |
description | MOTIVATION: Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage. RESULTS: Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons. AVAILABILITY AND IMPLEMENTATION: The CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6901076 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-69010762019-12-16 CoCo: RNA-seq read assignment correction for nested genes and multimapped reads Deschamps-Francoeur, Gabrielle Boivin, Vincent Abou Elela, Sherif Scott, Michelle S Bioinformatics Original Papers MOTIVATION: Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage. RESULTS: Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons. AVAILABILITY AND IMPLEMENTATION: The CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-12-01 2019-05-29 /pmc/articles/PMC6901076/ /pubmed/31141144 http://dx.doi.org/10.1093/bioinformatics/btz433 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Deschamps-Francoeur, Gabrielle Boivin, Vincent Abou Elela, Sherif Scott, Michelle S CoCo: RNA-seq read assignment correction for nested genes and multimapped reads |
title | CoCo: RNA-seq read assignment correction for nested genes and multimapped reads |
title_full | CoCo: RNA-seq read assignment correction for nested genes and multimapped reads |
title_fullStr | CoCo: RNA-seq read assignment correction for nested genes and multimapped reads |
title_full_unstemmed | CoCo: RNA-seq read assignment correction for nested genes and multimapped reads |
title_short | CoCo: RNA-seq read assignment correction for nested genes and multimapped reads |
title_sort | coco: rna-seq read assignment correction for nested genes and multimapped reads |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6901076/ https://www.ncbi.nlm.nih.gov/pubmed/31141144 http://dx.doi.org/10.1093/bioinformatics/btz433 |
work_keys_str_mv | AT deschampsfrancoeurgabrielle cocornaseqreadassignmentcorrectionfornestedgenesandmultimappedreads AT boivinvincent cocornaseqreadassignmentcorrectionfornestedgenesandmultimappedreads AT abouelelasherif cocornaseqreadassignmentcorrectionfornestedgenesandmultimappedreads AT scottmichelles cocornaseqreadassignmentcorrectionfornestedgenesandmultimappedreads |