Cargando…

How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry

BACKGROUND: Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kind, Tobias, Scholz, Martin, Fiehn, Oliver
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673031/ https://www.ncbi.nlm.nih.gov/pubmed/19415114 http://dx.doi.org/10.1371/journal.pone.0005440

_version_	1782166567172177920
author	Kind, Tobias Scholz, Martin Fiehn, Oliver
author_facet	Kind, Tobias Scholz, Martin Fiehn, Oliver
author_sort	Kind, Tobias
collection	PubMed
description	BACKGROUND: Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit if peer-reviewed public databases could be queried to compile target lists of structures that already have been reported for a given species. We detail current obstacles to compile such a knowledge base of metabolites. RESULTS: As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonica and oryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases. CONCLUSIONS: We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons.
format	Text
id	pubmed-2673031
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-26730312009-05-05 How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry Kind, Tobias Scholz, Martin Fiehn, Oliver PLoS One Research Article BACKGROUND: Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit if peer-reviewed public databases could be queried to compile target lists of structures that already have been reported for a given species. We detail current obstacles to compile such a knowledge base of metabolites. RESULTS: As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonica and oryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases. CONCLUSIONS: We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons. Public Library of Science 2009-05-05 /pmc/articles/PMC2673031/ /pubmed/19415114 http://dx.doi.org/10.1371/journal.pone.0005440 Text en Kind et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Kind, Tobias Scholz, Martin Fiehn, Oliver How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry
title	How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry
title_full	How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry
title_fullStr	How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry
title_full_unstemmed	How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry
title_short	How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry
title_sort	how large is the metabolome? a critical analysis of data exchange practices in chemistry
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673031/ https://www.ncbi.nlm.nih.gov/pubmed/19415114 http://dx.doi.org/10.1371/journal.pone.0005440
work_keys_str_mv	AT kindtobias howlargeisthemetabolomeacriticalanalysisofdataexchangepracticesinchemistry AT scholzmartin howlargeisthemetabolomeacriticalanalysisofdataexchangepracticesinchemistry AT fiehnoliver howlargeisthemetabolomeacriticalanalysisofdataexchangepracticesinchemistry

How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry

Ejemplares similares