Cargando…

SamPler – a novel method for selecting parameters for gene functional annotation routines

BACKGROUND: As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be g...

Descripción completa

Detalles Bibliográficos
Autores principales: Cruz, Fernando, Lagoa, Davide, Mendes, João, Rocha, Isabel, Ferreira, Eugénio C., Rocha, Miguel, Dias, Oscar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6727554/
https://www.ncbi.nlm.nih.gov/pubmed/31488049
http://dx.doi.org/10.1186/s12859-019-3038-4
_version_ 1783449276058173440
author Cruz, Fernando
Lagoa, Davide
Mendes, João
Rocha, Isabel
Ferreira, Eugénio C.
Rocha, Miguel
Dias, Oscar
author_facet Cruz, Fernando
Lagoa, Davide
Mendes, João
Rocha, Isabel
Ferreira, Eugénio C.
Rocha, Miguel
Dias, Oscar
author_sort Cruz, Fernando
collection PubMed
description BACKGROUND: As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be guaranteed, as this is the first indicator of the genomic potential of every organism. Automatic procedures help accelerating the annotation process, though decreasing the confidence and reliability of the outcomes. Manually curating a genome-wide annotation of genes, enzymes and transporter proteins function is a highly time-consuming, tedious and impractical task, even for the most proficient curator. Hence, a semi-automated procedure, which balances the two approaches, will increase the reliability of the annotation, while speeding up the process. In fact, a prior analysis of the annotation algorithm may leverage its performance, by manipulating its parameters, hastening the downstream processing and the manual curation of assigning functions to genes encoding proteins. RESULTS: Here SamPler, a novel strategy to select parameters for gene functional annotation routines is presented. This semi-automated method is based on the manual curation of a randomly selected set of genes/proteins. Then, in a multi-dimensional array, this sample is used to assess the automatic annotations for all possible combinations of the algorithm’s parameters. These assessments allow creating an array of confusion matrices, for which several metrics are calculated (accuracy, precision and negative predictive value) and used to reach optimal values for the parameters. CONCLUSIONS: The potential of this methodology is demonstrated with four genome functional annotations performed in merlin, an in-house user-friendly computational framework for genome-scale metabolic annotation and model reconstruction. For that, SamPler was implemented as a new plugin for the merlin tool. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3038-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6727554
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67275542019-09-12 SamPler – a novel method for selecting parameters for gene functional annotation routines Cruz, Fernando Lagoa, Davide Mendes, João Rocha, Isabel Ferreira, Eugénio C. Rocha, Miguel Dias, Oscar BMC Bioinformatics Methodology Article BACKGROUND: As genome sequencing projects grow rapidly, the diversity of organisms with recently assembled genome sequences peaks at an unprecedented scale, thereby highlighting the need to make gene functional annotations fast and efficient. However, the (high) quality of such annotations must be guaranteed, as this is the first indicator of the genomic potential of every organism. Automatic procedures help accelerating the annotation process, though decreasing the confidence and reliability of the outcomes. Manually curating a genome-wide annotation of genes, enzymes and transporter proteins function is a highly time-consuming, tedious and impractical task, even for the most proficient curator. Hence, a semi-automated procedure, which balances the two approaches, will increase the reliability of the annotation, while speeding up the process. In fact, a prior analysis of the annotation algorithm may leverage its performance, by manipulating its parameters, hastening the downstream processing and the manual curation of assigning functions to genes encoding proteins. RESULTS: Here SamPler, a novel strategy to select parameters for gene functional annotation routines is presented. This semi-automated method is based on the manual curation of a randomly selected set of genes/proteins. Then, in a multi-dimensional array, this sample is used to assess the automatic annotations for all possible combinations of the algorithm’s parameters. These assessments allow creating an array of confusion matrices, for which several metrics are calculated (accuracy, precision and negative predictive value) and used to reach optimal values for the parameters. CONCLUSIONS: The potential of this methodology is demonstrated with four genome functional annotations performed in merlin, an in-house user-friendly computational framework for genome-scale metabolic annotation and model reconstruction. For that, SamPler was implemented as a new plugin for the merlin tool. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3038-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-09-05 /pmc/articles/PMC6727554/ /pubmed/31488049 http://dx.doi.org/10.1186/s12859-019-3038-4 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Cruz, Fernando
Lagoa, Davide
Mendes, João
Rocha, Isabel
Ferreira, Eugénio C.
Rocha, Miguel
Dias, Oscar
SamPler – a novel method for selecting parameters for gene functional annotation routines
title SamPler – a novel method for selecting parameters for gene functional annotation routines
title_full SamPler – a novel method for selecting parameters for gene functional annotation routines
title_fullStr SamPler – a novel method for selecting parameters for gene functional annotation routines
title_full_unstemmed SamPler – a novel method for selecting parameters for gene functional annotation routines
title_short SamPler – a novel method for selecting parameters for gene functional annotation routines
title_sort sampler – a novel method for selecting parameters for gene functional annotation routines
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6727554/
https://www.ncbi.nlm.nih.gov/pubmed/31488049
http://dx.doi.org/10.1186/s12859-019-3038-4
work_keys_str_mv AT cruzfernando sampleranovelmethodforselectingparametersforgenefunctionalannotationroutines
AT lagoadavide sampleranovelmethodforselectingparametersforgenefunctionalannotationroutines
AT mendesjoao sampleranovelmethodforselectingparametersforgenefunctionalannotationroutines
AT rochaisabel sampleranovelmethodforselectingparametersforgenefunctionalannotationroutines
AT ferreiraeugenioc sampleranovelmethodforselectingparametersforgenefunctionalannotationroutines
AT rochamiguel sampleranovelmethodforselectingparametersforgenefunctionalannotationroutines
AT diasoscar sampleranovelmethodforselectingparametersforgenefunctionalannotationroutines