Cargando…

A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (R...

Descripción completa

Detalles Bibliográficos
Autores principales: Orgeur, Mickael, Martens, Marvin, Börno, Stefan T., Timmermann, Bernd, Duprez, Delphine, Stricker, Sigmar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Company of Biologists Ltd 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5827264/
https://www.ncbi.nlm.nih.gov/pubmed/29183907
http://dx.doi.org/10.1242/bio.028498
_version_ 1783302447516614656
author Orgeur, Mickael
Martens, Marvin
Börno, Stefan T.
Timmermann, Bernd
Duprez, Delphine
Stricker, Sigmar
author_facet Orgeur, Mickael
Martens, Marvin
Börno, Stefan T.
Timmermann, Bernd
Duprez, Delphine
Stricker, Sigmar
author_sort Orgeur, Mickael
collection PubMed
description The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.
format Online
Article
Text
id pubmed-5827264
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher The Company of Biologists Ltd
record_format MEDLINE/PubMed
spelling pubmed-58272642018-02-28 A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model Orgeur, Mickael Martens, Marvin Börno, Stefan T. Timmermann, Bernd Duprez, Delphine Stricker, Sigmar Biol Open Methods & Techniques The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies. The Company of Biologists Ltd 2017-11-28 /pmc/articles/PMC5827264/ /pubmed/29183907 http://dx.doi.org/10.1242/bio.028498 Text en © 2018. Published by The Company of Biologists Ltd http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.
spellingShingle Methods & Techniques
Orgeur, Mickael
Martens, Marvin
Börno, Stefan T.
Timmermann, Bernd
Duprez, Delphine
Stricker, Sigmar
A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_full A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_fullStr A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_full_unstemmed A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_short A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_sort dual transcript-discovery approach to improve the delimitation of gene features from rna-seq data in the chicken model
topic Methods & Techniques
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5827264/
https://www.ncbi.nlm.nih.gov/pubmed/29183907
http://dx.doi.org/10.1242/bio.028498
work_keys_str_mv AT orgeurmickael adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT martensmarvin adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT bornostefant adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT timmermannbernd adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT duprezdelphine adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT strickersigmar adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT orgeurmickael dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT martensmarvin dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT bornostefant dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT timmermannbernd dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT duprezdelphine dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT strickersigmar dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel