Cargando…

A biosynthetically informed distance measure to compare secondary metabolite profiles

Secondary metabolite profiles are one of the most diverse phenotypes of organisms and can consist of a large number of compounds originating from a limited number of biosynthetic pathways. The statistical treatment of such profiles often is complicated due to their diversity as well as the intra- an...

Descripción completa

Detalles Bibliográficos
Autor principal: Junker, Robert R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5840250/
https://www.ncbi.nlm.nih.gov/pubmed/29540963
http://dx.doi.org/10.1007/s00049-017-0250-4
_version_ 1783304539449851904
author Junker, Robert R.
author_facet Junker, Robert R.
author_sort Junker, Robert R.
collection PubMed
description Secondary metabolite profiles are one of the most diverse phenotypes of organisms and can consist of a large number of compounds originating from a limited number of biosynthetic pathways. The statistical treatment of such profiles often is complicated due to their diversity as well as the intra- and interspecific variability in the quantitative and qualitative composition of secondary metabolites. Most importantly, the assumption of independence of the presence/absence and the quantity of compounds is violated due to the shared biosynthetic origin of many compounds. Therefore, I propose a biosynthetically informed pairwise distance measure that fully considers the biosynthesis of the compounds and thus quantifies the similarity in the enzymatic equipment of two samples. The biosynthetic similarity of compounds is calculated based on the proportion of shared enzymes that are required for their biosynthesis. Using this information (provided as dendrogram structure) and the quantitative composition of the samples, generalized UniFrac distances are calculated measuring that fraction of the dendrogram (i.e., the branch lengths) that is unique to either of the samples but not shared by both samples. To allow a straightforward cross-platform application of the approach, I provide functions for the statistical software R and sample data sets. A hypothetical and a real world example show the feasibility of the biosynthetically informed distances d(A,B) and highlight the differences to conventional distance measures. The advantages of this approach and potential fields of application are discussed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s00049-017-0250-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5840250
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-58402502018-03-12 A biosynthetically informed distance measure to compare secondary metabolite profiles Junker, Robert R. Chemoecology Short Communication Secondary metabolite profiles are one of the most diverse phenotypes of organisms and can consist of a large number of compounds originating from a limited number of biosynthetic pathways. The statistical treatment of such profiles often is complicated due to their diversity as well as the intra- and interspecific variability in the quantitative and qualitative composition of secondary metabolites. Most importantly, the assumption of independence of the presence/absence and the quantity of compounds is violated due to the shared biosynthetic origin of many compounds. Therefore, I propose a biosynthetically informed pairwise distance measure that fully considers the biosynthesis of the compounds and thus quantifies the similarity in the enzymatic equipment of two samples. The biosynthetic similarity of compounds is calculated based on the proportion of shared enzymes that are required for their biosynthesis. Using this information (provided as dendrogram structure) and the quantitative composition of the samples, generalized UniFrac distances are calculated measuring that fraction of the dendrogram (i.e., the branch lengths) that is unique to either of the samples but not shared by both samples. To allow a straightforward cross-platform application of the approach, I provide functions for the statistical software R and sample data sets. A hypothetical and a real world example show the feasibility of the biosynthetically informed distances d(A,B) and highlight the differences to conventional distance measures. The advantages of this approach and potential fields of application are discussed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s00049-017-0250-4) contains supplementary material, which is available to authorized users. Springer International Publishing 2017-11-27 2018 /pmc/articles/PMC5840250/ /pubmed/29540963 http://dx.doi.org/10.1007/s00049-017-0250-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Short Communication
Junker, Robert R.
A biosynthetically informed distance measure to compare secondary metabolite profiles
title A biosynthetically informed distance measure to compare secondary metabolite profiles
title_full A biosynthetically informed distance measure to compare secondary metabolite profiles
title_fullStr A biosynthetically informed distance measure to compare secondary metabolite profiles
title_full_unstemmed A biosynthetically informed distance measure to compare secondary metabolite profiles
title_short A biosynthetically informed distance measure to compare secondary metabolite profiles
title_sort biosynthetically informed distance measure to compare secondary metabolite profiles
topic Short Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5840250/
https://www.ncbi.nlm.nih.gov/pubmed/29540963
http://dx.doi.org/10.1007/s00049-017-0250-4
work_keys_str_mv AT junkerrobertr abiosyntheticallyinformeddistancemeasuretocomparesecondarymetaboliteprofiles
AT junkerrobertr biosyntheticallyinformeddistancemeasuretocomparesecondarymetaboliteprofiles