Cargando…

Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models

BACKGROUND: The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilbrandt, Jeanne, Misof, Bernhard, Panfilio, Kristen A., Niehuis, Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6798390/
https://www.ncbi.nlm.nih.gov/pubmed/31623555
http://dx.doi.org/10.1186/s12864-019-6064-8
_version_ 1783460027713978368
author Wilbrandt, Jeanne
Misof, Bernhard
Panfilio, Kristen A.
Niehuis, Oliver
author_facet Wilbrandt, Jeanne
Misof, Bernhard
Panfilio, Kristen A.
Niehuis, Oliver
author_sort Wilbrandt, Jeanne
collection PubMed
description BACKGROUND: The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative. RESULTS: Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities. CONCLUSIONS: In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-6064-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6798390
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67983902019-10-21 Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models Wilbrandt, Jeanne Misof, Bernhard Panfilio, Kristen A. Niehuis, Oliver BMC Genomics Research Article BACKGROUND: The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative. RESULTS: Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities. CONCLUSIONS: In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-6064-8) contains supplementary material, which is available to authorized users. BioMed Central 2019-10-17 /pmc/articles/PMC6798390/ /pubmed/31623555 http://dx.doi.org/10.1186/s12864-019-6064-8 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Wilbrandt, Jeanne
Misof, Bernhard
Panfilio, Kristen A.
Niehuis, Oliver
Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models
title Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models
title_full Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models
title_fullStr Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models
title_full_unstemmed Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models
title_short Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models
title_sort repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6798390/
https://www.ncbi.nlm.nih.gov/pubmed/31623555
http://dx.doi.org/10.1186/s12864-019-6064-8
work_keys_str_mv AT wilbrandtjeanne repertoirewidegenestructureanalysesacasestudycomparingautomaticallypredictedandmanuallyannotatedgenemodels
AT misofbernhard repertoirewidegenestructureanalysesacasestudycomparingautomaticallypredictedandmanuallyannotatedgenemodels
AT panfiliokristena repertoirewidegenestructureanalysesacasestudycomparingautomaticallypredictedandmanuallyannotatedgenemodels
AT niehuisoliver repertoirewidegenestructureanalysesacasestudycomparingautomaticallypredictedandmanuallyannotatedgenemodels