Cargando…
Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence
OBJECTIVE: The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are difficult...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5716242/ https://www.ncbi.nlm.nih.gov/pubmed/29202864 http://dx.doi.org/10.1186/s13104-017-2985-y |
_version_ | 1783283909086150656 |
---|---|
author | Pucker, Boas Holtgräwe, Daniela Weisshaar, Bernd |
author_facet | Pucker, Boas Holtgräwe, Daniela Weisshaar, Bernd |
author_sort | Pucker, Boas |
collection | PubMed |
description | OBJECTIVE: The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are difficult to predict ab initio, we checked for options to improve the annotation by transferring annotation information from the recently released Columbia-0 reference genome sequence annotation Araport11. RESULTS: Incorporation of hints generated from Araport11 enabled the precise prediction of non-canonical splice sites. Manual inspection of RNA-Seq read mapping and RT-PCR were applied to validate the structural annotations of non-canonical splice sites. Predictions of untranslated regions were also updated by harnessing the potential of Araport11’s information, which was generated by using high coverage RNA-Seq data. The improved gene set of the Nd-1 genome assembly (GeneSet_Nd-1_v1.1) was evaluated via comparison to the initial gene prediction (GeneSet_Nd-1_v1.0) as well as against Araport11 for the Col-0 reference genome sequence. GeneSet_Nd-1_v1.1 contains previously missed non-canonical splice sites in 1256 genes. Reciprocal best hits for 24,527 (89.4%) of all nuclear Col-0 genes against the GeneSet_Nd-1_v1.1 indicate a high gene prediction quality. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13104-017-2985-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5716242 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57162422017-12-08 Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence Pucker, Boas Holtgräwe, Daniela Weisshaar, Bernd BMC Res Notes Research Note OBJECTIVE: The Arabidopsis thaliana Niederzenz-1 genome sequence was recently published with an ab initio gene prediction. In depth analysis of the predicted gene set revealed some errors involving genes with non-canonical splice sites in their introns. Since non-canonical splice sites are difficult to predict ab initio, we checked for options to improve the annotation by transferring annotation information from the recently released Columbia-0 reference genome sequence annotation Araport11. RESULTS: Incorporation of hints generated from Araport11 enabled the precise prediction of non-canonical splice sites. Manual inspection of RNA-Seq read mapping and RT-PCR were applied to validate the structural annotations of non-canonical splice sites. Predictions of untranslated regions were also updated by harnessing the potential of Araport11’s information, which was generated by using high coverage RNA-Seq data. The improved gene set of the Nd-1 genome assembly (GeneSet_Nd-1_v1.1) was evaluated via comparison to the initial gene prediction (GeneSet_Nd-1_v1.0) as well as against Araport11 for the Col-0 reference genome sequence. GeneSet_Nd-1_v1.1 contains previously missed non-canonical splice sites in 1256 genes. Reciprocal best hits for 24,527 (89.4%) of all nuclear Col-0 genes against the GeneSet_Nd-1_v1.1 indicate a high gene prediction quality. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13104-017-2985-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-04 /pmc/articles/PMC5716242/ /pubmed/29202864 http://dx.doi.org/10.1186/s13104-017-2985-y Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Note Pucker, Boas Holtgräwe, Daniela Weisshaar, Bernd Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_full | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_fullStr | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_full_unstemmed | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_short | Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence |
title_sort | consideration of non-canonical splice sites improves gene prediction on the arabidopsis thaliana niederzenz-1 genome sequence |
topic | Research Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5716242/ https://www.ncbi.nlm.nih.gov/pubmed/29202864 http://dx.doi.org/10.1186/s13104-017-2985-y |
work_keys_str_mv | AT puckerboas considerationofnoncanonicalsplicesitesimprovesgenepredictiononthearabidopsisthaliananiederzenz1genomesequence AT holtgrawedaniela considerationofnoncanonicalsplicesitesimprovesgenepredictiononthearabidopsisthaliananiederzenz1genomesequence AT weisshaarbernd considerationofnoncanonicalsplicesitesimprovesgenepredictiononthearabidopsisthaliananiederzenz1genomesequence |