Cargando…

Automatic Assignment of Prokaryotic Genes to Functional Categories Using Literature Profiling

In the last years, there was an exponential increase in the number of publicly available genomes. Once finished, most genome projects lack financial support to review annotations. A few of these gene annotations are based on a combination of bioinformatics evidence, however, in most cases, annotatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Torrieri, Raul, Oliveira, Francislon S., Oliveira, Guilherme, Coimbra, Roney S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3471813/
https://www.ncbi.nlm.nih.gov/pubmed/23077617
http://dx.doi.org/10.1371/journal.pone.0047436
_version_ 1782246470387236864
author Torrieri, Raul
Oliveira, Francislon S.
Oliveira, Guilherme
Coimbra, Roney S.
author_facet Torrieri, Raul
Oliveira, Francislon S.
Oliveira, Guilherme
Coimbra, Roney S.
author_sort Torrieri, Raul
collection PubMed
description In the last years, there was an exponential increase in the number of publicly available genomes. Once finished, most genome projects lack financial support to review annotations. A few of these gene annotations are based on a combination of bioinformatics evidence, however, in most cases, annotations are based solely on sequence similarity to a previously known gene, which was most probably annotated in the same way. As a result, a large number of predicted genes remain unassigned to any functional category despite the fact that there is enough evidence in the literature to predict their function. We developed a classifier trained with term-frequency vectors automatically disclosed from text corpora of an ensemble of genes representative of each functional category of the J. Craig Venter Institute Comprehensive Microbial Resource (JCVI-CMR) ontology. The classifier achieved up to 84% precision with 68% recall (for confidence≥0.4), F-measure 0.76 (recall and precision equally weighted) in an independent set of 2,220 genes, from 13 bacterial species, previously classified by JCVI-CMR into unambiguous categories of its ontology. Finally, the classifier assigned (confidence≥0.7) to functional categories a total of 5,235 out of the ∼24 thousand genes previously in categories “Unknown function” or “Unclassified” for which there is literature in MEDLINE. Two biologists reviewed the literature of 100 of these genes, randomly picket, and assigned them to the same functional categories predicted by the automatic classifier. Our results confirmed the hypothesis that it is possible to confidently assign genes of a real world repository to functional categories, based exclusively on the automatic profiling of its associated literature. The LitProf - Gene Classifier web server is accessible at: www.cebio.org/litprofGC.
format Online
Article
Text
id pubmed-3471813
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34718132012-10-17 Automatic Assignment of Prokaryotic Genes to Functional Categories Using Literature Profiling Torrieri, Raul Oliveira, Francislon S. Oliveira, Guilherme Coimbra, Roney S. PLoS One Research Article In the last years, there was an exponential increase in the number of publicly available genomes. Once finished, most genome projects lack financial support to review annotations. A few of these gene annotations are based on a combination of bioinformatics evidence, however, in most cases, annotations are based solely on sequence similarity to a previously known gene, which was most probably annotated in the same way. As a result, a large number of predicted genes remain unassigned to any functional category despite the fact that there is enough evidence in the literature to predict their function. We developed a classifier trained with term-frequency vectors automatically disclosed from text corpora of an ensemble of genes representative of each functional category of the J. Craig Venter Institute Comprehensive Microbial Resource (JCVI-CMR) ontology. The classifier achieved up to 84% precision with 68% recall (for confidence≥0.4), F-measure 0.76 (recall and precision equally weighted) in an independent set of 2,220 genes, from 13 bacterial species, previously classified by JCVI-CMR into unambiguous categories of its ontology. Finally, the classifier assigned (confidence≥0.7) to functional categories a total of 5,235 out of the ∼24 thousand genes previously in categories “Unknown function” or “Unclassified” for which there is literature in MEDLINE. Two biologists reviewed the literature of 100 of these genes, randomly picket, and assigned them to the same functional categories predicted by the automatic classifier. Our results confirmed the hypothesis that it is possible to confidently assign genes of a real world repository to functional categories, based exclusively on the automatic profiling of its associated literature. The LitProf - Gene Classifier web server is accessible at: www.cebio.org/litprofGC. Public Library of Science 2012-10-15 /pmc/articles/PMC3471813/ /pubmed/23077617 http://dx.doi.org/10.1371/journal.pone.0047436 Text en © 2012 Torrieri et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Torrieri, Raul
Oliveira, Francislon S.
Oliveira, Guilherme
Coimbra, Roney S.
Automatic Assignment of Prokaryotic Genes to Functional Categories Using Literature Profiling
title Automatic Assignment of Prokaryotic Genes to Functional Categories Using Literature Profiling
title_full Automatic Assignment of Prokaryotic Genes to Functional Categories Using Literature Profiling
title_fullStr Automatic Assignment of Prokaryotic Genes to Functional Categories Using Literature Profiling
title_full_unstemmed Automatic Assignment of Prokaryotic Genes to Functional Categories Using Literature Profiling
title_short Automatic Assignment of Prokaryotic Genes to Functional Categories Using Literature Profiling
title_sort automatic assignment of prokaryotic genes to functional categories using literature profiling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3471813/
https://www.ncbi.nlm.nih.gov/pubmed/23077617
http://dx.doi.org/10.1371/journal.pone.0047436
work_keys_str_mv AT torrieriraul automaticassignmentofprokaryoticgenestofunctionalcategoriesusingliteratureprofiling
AT oliveirafrancislons automaticassignmentofprokaryoticgenestofunctionalcategoriesusingliteratureprofiling
AT oliveiraguilherme automaticassignmentofprokaryoticgenestofunctionalcategoriesusingliteratureprofiling
AT coimbraroneys automaticassignmentofprokaryoticgenestofunctionalcategoriesusingliteratureprofiling