Cargando…

Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case

BACKGROUND: For most organisms, even if their genome sequence is available, little functional information about individual genes or proteins exists. Several annotation pipelines have been developed for functional analysis based on sequence, ‘omics’, and literature data. However, researchers encounte...

Descripción completa

Detalles Bibliográficos
Autores principales: Amar, David, Frades, Itziar, Danek, Agnieszka, Goldberg, Tatyana, Sharma, Sanjeev K, Hedley, Pete E, Proux-Wera, Estelle, Andreasson, Erik, Shamir, Ron, Tzfadia, Oren, Alexandersson, Erik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274702/
https://www.ncbi.nlm.nih.gov/pubmed/25476999
http://dx.doi.org/10.1186/s12870-014-0329-9
_version_ 1782350021807570944
author Amar, David
Frades, Itziar
Danek, Agnieszka
Goldberg, Tatyana
Sharma, Sanjeev K
Hedley, Pete E
Proux-Wera, Estelle
Andreasson, Erik
Shamir, Ron
Tzfadia, Oren
Alexandersson, Erik
author_facet Amar, David
Frades, Itziar
Danek, Agnieszka
Goldberg, Tatyana
Sharma, Sanjeev K
Hedley, Pete E
Proux-Wera, Estelle
Andreasson, Erik
Shamir, Ron
Tzfadia, Oren
Alexandersson, Erik
author_sort Amar, David
collection PubMed
description BACKGROUND: For most organisms, even if their genome sequence is available, little functional information about individual genes or proteins exists. Several annotation pipelines have been developed for functional analysis based on sequence, ‘omics’, and literature data. However, researchers encounter little guidance on how well they perform. Here, we used the recently sequenced potato genome as a case study. The potato genome was selected since its genome is newly sequenced and it is a non-model plant even if there is relatively ample information on individual potato genes, and multiple gene expression profiles are available. RESULTS: We show that the automatic gene annotations of potato have low accuracy when compared to a “gold standard” based on experimentally validated potato genes. Furthermore, we evaluate six state-of-the-art annotation pipelines and show that their predictions are markedly dissimilar (Jaccard similarity coefficient of 0.27 between pipelines on average). To overcome this discrepancy, we introduce a simple GO structure-based algorithm that reconciles the predictions of the different pipelines. We show that the integrated annotation covers more genes, increases by over 50% the number of highly co-expressed GO processes, and obtains much higher agreement with the gold standard. CONCLUSIONS: We find that different annotation pipelines produce different results, and show how to integrate them into a unified annotation that is of higher quality than each single pipeline. We offer an improved functional annotation of both PGSC and ITAG potato gene models, as well as tools that can be applied to additional pipelines and improve annotation in other organisms. This will greatly aid future functional analysis of ‘-omics’ datasets from potato and other organisms with newly sequenced genomes. The new potato annotations are available with this paper. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12870-014-0329-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4274702
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42747022015-01-02 Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case Amar, David Frades, Itziar Danek, Agnieszka Goldberg, Tatyana Sharma, Sanjeev K Hedley, Pete E Proux-Wera, Estelle Andreasson, Erik Shamir, Ron Tzfadia, Oren Alexandersson, Erik BMC Plant Biol Research Article BACKGROUND: For most organisms, even if their genome sequence is available, little functional information about individual genes or proteins exists. Several annotation pipelines have been developed for functional analysis based on sequence, ‘omics’, and literature data. However, researchers encounter little guidance on how well they perform. Here, we used the recently sequenced potato genome as a case study. The potato genome was selected since its genome is newly sequenced and it is a non-model plant even if there is relatively ample information on individual potato genes, and multiple gene expression profiles are available. RESULTS: We show that the automatic gene annotations of potato have low accuracy when compared to a “gold standard” based on experimentally validated potato genes. Furthermore, we evaluate six state-of-the-art annotation pipelines and show that their predictions are markedly dissimilar (Jaccard similarity coefficient of 0.27 between pipelines on average). To overcome this discrepancy, we introduce a simple GO structure-based algorithm that reconciles the predictions of the different pipelines. We show that the integrated annotation covers more genes, increases by over 50% the number of highly co-expressed GO processes, and obtains much higher agreement with the gold standard. CONCLUSIONS: We find that different annotation pipelines produce different results, and show how to integrate them into a unified annotation that is of higher quality than each single pipeline. We offer an improved functional annotation of both PGSC and ITAG potato gene models, as well as tools that can be applied to additional pipelines and improve annotation in other organisms. This will greatly aid future functional analysis of ‘-omics’ datasets from potato and other organisms with newly sequenced genomes. The new potato annotations are available with this paper. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12870-014-0329-9) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-05 /pmc/articles/PMC4274702/ /pubmed/25476999 http://dx.doi.org/10.1186/s12870-014-0329-9 Text en © Amar et al.; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Amar, David
Frades, Itziar
Danek, Agnieszka
Goldberg, Tatyana
Sharma, Sanjeev K
Hedley, Pete E
Proux-Wera, Estelle
Andreasson, Erik
Shamir, Ron
Tzfadia, Oren
Alexandersson, Erik
Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case
title Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case
title_full Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case
title_fullStr Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case
title_full_unstemmed Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case
title_short Evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case
title_sort evaluation and integration of functional annotation pipelines for newly sequenced organisms: the potato genome as a test case
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274702/
https://www.ncbi.nlm.nih.gov/pubmed/25476999
http://dx.doi.org/10.1186/s12870-014-0329-9
work_keys_str_mv AT amardavid evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT fradesitziar evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT danekagnieszka evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT goldbergtatyana evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT sharmasanjeevk evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT hedleypetee evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT prouxweraestelle evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT andreassonerik evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT shamirron evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT tzfadiaoren evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase
AT alexanderssonerik evaluationandintegrationoffunctionalannotationpipelinesfornewlysequencedorganismsthepotatogenomeasatestcase