Cargando…

Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies

Partial Least Squares-Discriminant Analysis (PLS-DA) is a PLS regression method with a special binary ‘dummy’ y-variable and it is commonly used for classification purposes and biomarker selection in metabolomics studies. Several statistical approaches are currently in use to validate outcomes of PL...

Descripción completa

Detalles Bibliográficos
Autores principales: Szymańska, Ewa, Saccenti, Edoardo, Smilde, Age K., Westerhuis, Johan A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3337399/
https://www.ncbi.nlm.nih.gov/pubmed/22593721
http://dx.doi.org/10.1007/s11306-011-0330-3
_version_ 1782231069766975488
author Szymańska, Ewa
Saccenti, Edoardo
Smilde, Age K.
Westerhuis, Johan A.
author_facet Szymańska, Ewa
Saccenti, Edoardo
Smilde, Age K.
Westerhuis, Johan A.
author_sort Szymańska, Ewa
collection PubMed
description Partial Least Squares-Discriminant Analysis (PLS-DA) is a PLS regression method with a special binary ‘dummy’ y-variable and it is commonly used for classification purposes and biomarker selection in metabolomics studies. Several statistical approaches are currently in use to validate outcomes of PLS-DA analyses e.g. double cross validation procedures or permutation testing. However, there is a great inconsistency in the optimization and the assessment of performance of PLS-DA models due to many different diagnostic statistics currently employed in metabolomics data analyses. In this paper, properties of four diagnostic statistics of PLS-DA, namely the number of misclassifications (NMC), the Area Under the Receiver Operating Characteristic (AUROC), Q (2) and Discriminant Q (2) (DQ (2)) are discussed. All four diagnostic statistics are used in the optimization and the performance assessment of PLS-DA models of three different-size metabolomics data sets obtained with two different types of analytical platforms and with different levels of known differences between two groups: control and case groups. Statistical significance of obtained PLS-DA models was evaluated with permutation testing. PLS-DA models obtained with NMC and AUROC are more powerful in detecting very small differences between groups than models obtained with Q (2) and Discriminant Q (2) (DQ (2)). Reproducibility of obtained PLS-DA models outcomes, models complexity and permutation test distributions are also investigated to explain this phenomenon. DQ (2) and Q (2) (in contrary to NMC and AUROC) prefer PLS-DA models with lower complexity and require higher number of permutation tests and submodels to accurately estimate statistical significance of the model performance. NMC and AUROC seem more efficient and more reliable diagnostic statistics and should be recommended in two group discrimination metabolomic studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-011-0330-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-3337399
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-33373992012-05-14 Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies Szymańska, Ewa Saccenti, Edoardo Smilde, Age K. Westerhuis, Johan A. Metabolomics Original Article Partial Least Squares-Discriminant Analysis (PLS-DA) is a PLS regression method with a special binary ‘dummy’ y-variable and it is commonly used for classification purposes and biomarker selection in metabolomics studies. Several statistical approaches are currently in use to validate outcomes of PLS-DA analyses e.g. double cross validation procedures or permutation testing. However, there is a great inconsistency in the optimization and the assessment of performance of PLS-DA models due to many different diagnostic statistics currently employed in metabolomics data analyses. In this paper, properties of four diagnostic statistics of PLS-DA, namely the number of misclassifications (NMC), the Area Under the Receiver Operating Characteristic (AUROC), Q (2) and Discriminant Q (2) (DQ (2)) are discussed. All four diagnostic statistics are used in the optimization and the performance assessment of PLS-DA models of three different-size metabolomics data sets obtained with two different types of analytical platforms and with different levels of known differences between two groups: control and case groups. Statistical significance of obtained PLS-DA models was evaluated with permutation testing. PLS-DA models obtained with NMC and AUROC are more powerful in detecting very small differences between groups than models obtained with Q (2) and Discriminant Q (2) (DQ (2)). Reproducibility of obtained PLS-DA models outcomes, models complexity and permutation test distributions are also investigated to explain this phenomenon. DQ (2) and Q (2) (in contrary to NMC and AUROC) prefer PLS-DA models with lower complexity and require higher number of permutation tests and submodels to accurately estimate statistical significance of the model performance. NMC and AUROC seem more efficient and more reliable diagnostic statistics and should be recommended in two group discrimination metabolomic studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-011-0330-3) contains supplementary material, which is available to authorized users. Springer US 2011-07-08 2012 /pmc/articles/PMC3337399/ /pubmed/22593721 http://dx.doi.org/10.1007/s11306-011-0330-3 Text en © The Author(s) 2011 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
spellingShingle Original Article
Szymańska, Ewa
Saccenti, Edoardo
Smilde, Age K.
Westerhuis, Johan A.
Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies
title Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies
title_full Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies
title_fullStr Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies
title_full_unstemmed Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies
title_short Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies
title_sort double-check: validation of diagnostic statistics for pls-da models in metabolomics studies
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3337399/
https://www.ncbi.nlm.nih.gov/pubmed/22593721
http://dx.doi.org/10.1007/s11306-011-0330-3
work_keys_str_mv AT szymanskaewa doublecheckvalidationofdiagnosticstatisticsforplsdamodelsinmetabolomicsstudies
AT saccentiedoardo doublecheckvalidationofdiagnosticstatisticsforplsdamodelsinmetabolomicsstudies
AT smildeagek doublecheckvalidationofdiagnosticstatisticsforplsdamodelsinmetabolomicsstudies
AT westerhuisjohana doublecheckvalidationofdiagnosticstatisticsforplsdamodelsinmetabolomicsstudies