Cargando…

Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry

BACKGROUND: Structure elucidation of unknown small molecules by mass spectrometry is a challenge despite advances in instrumentation. The first crucial step is to obtain correct elemental compositions. In order to automatically constrain the thousands of possible candidate structures, rules need to...

Descripción completa

Detalles Bibliográficos
Autores principales: Kind, Tobias, Fiehn, Oliver
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1851972/
https://www.ncbi.nlm.nih.gov/pubmed/17389044
http://dx.doi.org/10.1186/1471-2105-8-105
_version_ 1782133000877637632
author Kind, Tobias
Fiehn, Oliver
author_facet Kind, Tobias
Fiehn, Oliver
author_sort Kind, Tobias
collection PubMed
description BACKGROUND: Structure elucidation of unknown small molecules by mass spectrometry is a challenge despite advances in instrumentation. The first crucial step is to obtain correct elemental compositions. In order to automatically constrain the thousands of possible candidate structures, rules need to be developed to select the most likely and chemically correct molecular formulas. RESULTS: An algorithm for filtering molecular formulas is derived from seven heuristic rules: (1) restrictions for the number of elements, (2) LEWIS and SENIOR chemical rules, (3) isotopic patterns, (4) hydrogen/carbon ratios, (5) element ratio of nitrogen, oxygen, phosphor, and sulphur versus carbon, (6) element ratio probabilities and (7) presence of trimethylsilylated compounds. Formulas are ranked according to their isotopic patterns and subsequently constrained by presence in public chemical databases. The seven rules were developed on 68,237 existing molecular formulas and were validated in four experiments. First, 432,968 formulas covering five million PubChem database entries were checked for consistency. Only 0.6% of these compounds did not pass all rules. Next, the rules were shown to effectively reducing the complement all eight billion theoretically possible C, H, N, S, O, P-formulas up to 2000 Da to only 623 million most probable elemental compositions. Thirdly 6,000 pharmaceutical, toxic and natural compounds were selected from DrugBank, TSCA and DNP databases. The correct formulas were retrieved as top hit at 80–99% probability when assuming data acquisition with complete resolution of unique compounds and 5% absolute isotope ratio deviation and 3 ppm mass accuracy. Last, some exemplary compounds were analyzed by Fourier transform ion cyclotron resonance mass spectrometry and by gas chromatography-time of flight mass spectrometry. In each case, the correct formula was ranked as top hit when combining the seven rules with database queries. CONCLUSION: The seven rules enable an automatic exclusion of molecular formulas which are either wrong or which contain unlikely high or low number of elements. The correct molecular formula is assigned with a probability of 98% if the formula exists in a compound database. For truly novel compounds that are not present in databases, the correct formula is found in the first three hits with a probability of 65–81%. Corresponding software and supplemental data are available for downloads from the authors' website.
format Text
id pubmed-1851972
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18519722007-04-13 Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry Kind, Tobias Fiehn, Oliver BMC Bioinformatics Research Article BACKGROUND: Structure elucidation of unknown small molecules by mass spectrometry is a challenge despite advances in instrumentation. The first crucial step is to obtain correct elemental compositions. In order to automatically constrain the thousands of possible candidate structures, rules need to be developed to select the most likely and chemically correct molecular formulas. RESULTS: An algorithm for filtering molecular formulas is derived from seven heuristic rules: (1) restrictions for the number of elements, (2) LEWIS and SENIOR chemical rules, (3) isotopic patterns, (4) hydrogen/carbon ratios, (5) element ratio of nitrogen, oxygen, phosphor, and sulphur versus carbon, (6) element ratio probabilities and (7) presence of trimethylsilylated compounds. Formulas are ranked according to their isotopic patterns and subsequently constrained by presence in public chemical databases. The seven rules were developed on 68,237 existing molecular formulas and were validated in four experiments. First, 432,968 formulas covering five million PubChem database entries were checked for consistency. Only 0.6% of these compounds did not pass all rules. Next, the rules were shown to effectively reducing the complement all eight billion theoretically possible C, H, N, S, O, P-formulas up to 2000 Da to only 623 million most probable elemental compositions. Thirdly 6,000 pharmaceutical, toxic and natural compounds were selected from DrugBank, TSCA and DNP databases. The correct formulas were retrieved as top hit at 80–99% probability when assuming data acquisition with complete resolution of unique compounds and 5% absolute isotope ratio deviation and 3 ppm mass accuracy. Last, some exemplary compounds were analyzed by Fourier transform ion cyclotron resonance mass spectrometry and by gas chromatography-time of flight mass spectrometry. In each case, the correct formula was ranked as top hit when combining the seven rules with database queries. CONCLUSION: The seven rules enable an automatic exclusion of molecular formulas which are either wrong or which contain unlikely high or low number of elements. The correct molecular formula is assigned with a probability of 98% if the formula exists in a compound database. For truly novel compounds that are not present in databases, the correct formula is found in the first three hits with a probability of 65–81%. Corresponding software and supplemental data are available for downloads from the authors' website. BioMed Central 2007-03-27 /pmc/articles/PMC1851972/ /pubmed/17389044 http://dx.doi.org/10.1186/1471-2105-8-105 Text en Copyright © 2007 Kind and Fiehn; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kind, Tobias
Fiehn, Oliver
Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry
title Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry
title_full Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry
title_fullStr Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry
title_full_unstemmed Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry
title_short Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry
title_sort seven golden rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1851972/
https://www.ncbi.nlm.nih.gov/pubmed/17389044
http://dx.doi.org/10.1186/1471-2105-8-105
work_keys_str_mv AT kindtobias sevengoldenrulesforheuristicfilteringofmolecularformulasobtainedbyaccuratemassspectrometry
AT fiehnoliver sevengoldenrulesforheuristicfilteringofmolecularformulasobtainedbyaccuratemassspectrometry