Cargando…

Simple topological properties predict functional misannotations in a metabolic network

Motivation: Misannotation in sequence databases is an important obstacle for automated tools for gene function annotation, which rely extensively on comparison with sequences with known function. To improve current annotations and prevent future propagation of errors, sequence-independent tools are,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liberal, Rodrigo, Pinney, John W.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2013
Materias:	Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694667/ https://www.ncbi.nlm.nih.gov/pubmed/23812979 http://dx.doi.org/10.1093/bioinformatics/btt236

_version_	1782274884981751808
author	Liberal, Rodrigo Pinney, John W.
author_facet	Liberal, Rodrigo Pinney, John W.
author_sort	Liberal, Rodrigo
collection	PubMed
description	Motivation: Misannotation in sequence databases is an important obstacle for automated tools for gene function annotation, which rely extensively on comparison with sequences with known function. To improve current annotations and prevent future propagation of errors, sequence-independent tools are, therefore, needed to assist in the identification of misannotated gene products. In the case of enzymatic functions, each functional assignment implies the existence of a reaction within the organism’s metabolic network; a first approximation to a genome-scale metabolic model can be obtained directly from an automated genome annotation. Any obvious problems in the network, such as dead end or disconnected reactions, can, therefore, be strong indications of misannotation. Results: We demonstrate that a machine-learning approach using only network topological features can successfully predict the validity of enzyme annotations. The predictions are tested at three different levels. A random forest using topological features of the metabolic network and trained on curated sets of correct and incorrect enzyme assignments was found to have an accuracy of up to 86% in 5-fold cross-validation experiments. Further cross-validation against unseen enzyme superfamilies indicates that this classifier can successfully extrapolate beyond the classes of enzyme present in the training data. The random forest model was applied to several automated genome annotations, achieving an accuracy of [Image: see text] in most cases when validated against recent genome-scale metabolic models. We also observe that when applied to draft metabolic networks for multiple species, a clear negative correlation is observed between predicted annotation quality and phylogenetic distance to the major model organism for biochemistry (Escherichia coli for prokaryotes and Homo sapiens for eukaryotes). Contact: j.pinney@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-3694667
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-36946672013-06-27 Simple topological properties predict functional misannotations in a metabolic network Liberal, Rodrigo Pinney, John W. Bioinformatics Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany Motivation: Misannotation in sequence databases is an important obstacle for automated tools for gene function annotation, which rely extensively on comparison with sequences with known function. To improve current annotations and prevent future propagation of errors, sequence-independent tools are, therefore, needed to assist in the identification of misannotated gene products. In the case of enzymatic functions, each functional assignment implies the existence of a reaction within the organism’s metabolic network; a first approximation to a genome-scale metabolic model can be obtained directly from an automated genome annotation. Any obvious problems in the network, such as dead end or disconnected reactions, can, therefore, be strong indications of misannotation. Results: We demonstrate that a machine-learning approach using only network topological features can successfully predict the validity of enzyme annotations. The predictions are tested at three different levels. A random forest using topological features of the metabolic network and trained on curated sets of correct and incorrect enzyme assignments was found to have an accuracy of up to 86% in 5-fold cross-validation experiments. Further cross-validation against unseen enzyme superfamilies indicates that this classifier can successfully extrapolate beyond the classes of enzyme present in the training data. The random forest model was applied to several automated genome annotations, achieving an accuracy of [Image: see text] in most cases when validated against recent genome-scale metabolic models. We also observe that when applied to draft metabolic networks for multiple species, a clear negative correlation is observed between predicted annotation quality and phylogenetic distance to the major model organism for biochemistry (Escherichia coli for prokaryotes and Homo sapiens for eukaryotes). Contact: j.pinney@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2013-07-01 2013-06-19 /pmc/articles/PMC3694667/ /pubmed/23812979 http://dx.doi.org/10.1093/bioinformatics/btt236 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany Liberal, Rodrigo Pinney, John W. Simple topological properties predict functional misannotations in a metabolic network
title	Simple topological properties predict functional misannotations in a metabolic network
title_full	Simple topological properties predict functional misannotations in a metabolic network
title_fullStr	Simple topological properties predict functional misannotations in a metabolic network
title_full_unstemmed	Simple topological properties predict functional misannotations in a metabolic network
title_short	Simple topological properties predict functional misannotations in a metabolic network
title_sort	simple topological properties predict functional misannotations in a metabolic network
topic	Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694667/ https://www.ncbi.nlm.nih.gov/pubmed/23812979 http://dx.doi.org/10.1093/bioinformatics/btt236
work_keys_str_mv	AT liberalrodrigo simpletopologicalpropertiespredictfunctionalmisannotationsinametabolicnetwork AT pinneyjohnw simpletopologicalpropertiespredictfunctionalmisannotationsinametabolicnetwork

Simple topological properties predict functional misannotations in a metabolic network

Ejemplares similares