Cargando…

Machine learning methods for metabolic pathway prediction

BACKGROUND: A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on t...

Descripción completa

Detalles Bibliográficos
Autores principales: Dale, Joseph M, Popescu, Liviu, Karp, Peter D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146072/
https://www.ncbi.nlm.nih.gov/pubmed/20064214
http://dx.doi.org/10.1186/1471-2105-11-15
_version_ 1782209156463198208
author Dale, Joseph M
Popescu, Liviu
Karp, Peter D
author_facet Dale, Joseph M
Popescu, Liviu
Karp, Peter D
author_sort Dale, Joseph M
collection PubMed
description BACKGROUND: A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. RESULTS: To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. CONCLUSIONS: ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.
format Online
Article
Text
id pubmed-3146072
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31460722011-07-30 Machine learning methods for metabolic pathway prediction Dale, Joseph M Popescu, Liviu Karp, Peter D BMC Bioinformatics Research Article BACKGROUND: A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. RESULTS: To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. CONCLUSIONS: ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations. BioMed Central 2010-01-08 /pmc/articles/PMC3146072/ /pubmed/20064214 http://dx.doi.org/10.1186/1471-2105-11-15 Text en Copyright ©2010 Dale et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Dale, Joseph M
Popescu, Liviu
Karp, Peter D
Machine learning methods for metabolic pathway prediction
title Machine learning methods for metabolic pathway prediction
title_full Machine learning methods for metabolic pathway prediction
title_fullStr Machine learning methods for metabolic pathway prediction
title_full_unstemmed Machine learning methods for metabolic pathway prediction
title_short Machine learning methods for metabolic pathway prediction
title_sort machine learning methods for metabolic pathway prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146072/
https://www.ncbi.nlm.nih.gov/pubmed/20064214
http://dx.doi.org/10.1186/1471-2105-11-15
work_keys_str_mv AT dalejosephm machinelearningmethodsformetabolicpathwayprediction
AT popesculiviu machinelearningmethodsformetabolicpathwayprediction
AT karppeterd machinelearningmethodsformetabolicpathwayprediction