Cargando…

Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation

BACKGROUND: Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising...

Descripción completa

Detalles Bibliográficos
Autores principales: Parsons, Helen M, Ludwig, Christian, Günther, Ulrich L, Viant, Mark R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1965488/
https://www.ncbi.nlm.nih.gov/pubmed/17605789
http://dx.doi.org/10.1186/1471-2105-8-234
_version_ 1782134678614966272
author Parsons, Helen M
Ludwig, Christian
Günther, Ulrich L
Viant, Mark R
author_facet Parsons, Helen M
Ludwig, Christian
Günther, Ulrich L
Viant, Mark R
author_sort Parsons, Helen M
collection PubMed
description BACKGROUND: Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) (1)H, projections of 2D (1)H, (1)H J-resolved (pJRES), and intact 2D J-resolved (JRES). RESULTS: Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra. CONCLUSION: We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra.
format Text
id pubmed-1965488
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19654882007-09-06 Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation Parsons, Helen M Ludwig, Christian Günther, Ulrich L Viant, Mark R BMC Bioinformatics Research Article BACKGROUND: Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) (1)H, projections of 2D (1)H, (1)H J-resolved (pJRES), and intact 2D J-resolved (JRES). RESULTS: Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra. CONCLUSION: We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra. BioMed Central 2007-07-02 /pmc/articles/PMC1965488/ /pubmed/17605789 http://dx.doi.org/10.1186/1471-2105-8-234 Text en Copyright © 2007 Parsons et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Parsons, Helen M
Ludwig, Christian
Günther, Ulrich L
Viant, Mark R
Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation
title Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation
title_full Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation
title_fullStr Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation
title_full_unstemmed Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation
title_short Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation
title_sort improved classification accuracy in 1- and 2-dimensional nmr metabolomics data using the variance stabilising generalised logarithm transformation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1965488/
https://www.ncbi.nlm.nih.gov/pubmed/17605789
http://dx.doi.org/10.1186/1471-2105-8-234
work_keys_str_mv AT parsonshelenm improvedclassificationaccuracyin1and2dimensionalnmrmetabolomicsdatausingthevariancestabilisinggeneralisedlogarithmtransformation
AT ludwigchristian improvedclassificationaccuracyin1and2dimensionalnmrmetabolomicsdatausingthevariancestabilisinggeneralisedlogarithmtransformation
AT guntherulrichl improvedclassificationaccuracyin1and2dimensionalnmrmetabolomicsdatausingthevariancestabilisinggeneralisedlogarithmtransformation
AT viantmarkr improvedclassificationaccuracyin1and2dimensionalnmrmetabolomicsdatausingthevariancestabilisinggeneralisedlogarithmtransformation