Cargando…
Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation
BACKGROUND: Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1965488/ https://www.ncbi.nlm.nih.gov/pubmed/17605789 http://dx.doi.org/10.1186/1471-2105-8-234 |
_version_ | 1782134678614966272 |
---|---|
author | Parsons, Helen M Ludwig, Christian Günther, Ulrich L Viant, Mark R |
author_facet | Parsons, Helen M Ludwig, Christian Günther, Ulrich L Viant, Mark R |
author_sort | Parsons, Helen M |
collection | PubMed |
description | BACKGROUND: Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) (1)H, projections of 2D (1)H, (1)H J-resolved (pJRES), and intact 2D J-resolved (JRES). RESULTS: Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra. CONCLUSION: We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra. |
format | Text |
id | pubmed-1965488 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-19654882007-09-06 Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation Parsons, Helen M Ludwig, Christian Günther, Ulrich L Viant, Mark R BMC Bioinformatics Research Article BACKGROUND: Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) (1)H, projections of 2D (1)H, (1)H J-resolved (pJRES), and intact 2D J-resolved (JRES). RESULTS: Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra. CONCLUSION: We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra. BioMed Central 2007-07-02 /pmc/articles/PMC1965488/ /pubmed/17605789 http://dx.doi.org/10.1186/1471-2105-8-234 Text en Copyright © 2007 Parsons et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Parsons, Helen M Ludwig, Christian Günther, Ulrich L Viant, Mark R Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation |
title | Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation |
title_full | Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation |
title_fullStr | Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation |
title_full_unstemmed | Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation |
title_short | Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation |
title_sort | improved classification accuracy in 1- and 2-dimensional nmr metabolomics data using the variance stabilising generalised logarithm transformation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1965488/ https://www.ncbi.nlm.nih.gov/pubmed/17605789 http://dx.doi.org/10.1186/1471-2105-8-234 |
work_keys_str_mv | AT parsonshelenm improvedclassificationaccuracyin1and2dimensionalnmrmetabolomicsdatausingthevariancestabilisinggeneralisedlogarithmtransformation AT ludwigchristian improvedclassificationaccuracyin1and2dimensionalnmrmetabolomicsdatausingthevariancestabilisinggeneralisedlogarithmtransformation AT guntherulrichl improvedclassificationaccuracyin1and2dimensionalnmrmetabolomicsdatausingthevariancestabilisinggeneralisedlogarithmtransformation AT viantmarkr improvedclassificationaccuracyin1and2dimensionalnmrmetabolomicsdatausingthevariancestabilisinggeneralisedlogarithmtransformation |