Cargando…

A combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation

One of the difficulties encountered in the statistical analysis of metaproteomics data is the high proportion of missing values, which are usually treated by imputation. Nevertheless, imputation methods are based on restrictive assumptions regarding missingness mechanisms, namely “at random” or “not...

Descripción completa

Detalles Bibliográficos
Autores principales:	Plancade, Sandra, Berland, Magali, Blein-Nicolas, Mélisande, Langella, Olivier, Bassignani, Ariane, Juste, Catherine
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235818/ https://www.ncbi.nlm.nih.gov/pubmed/35769140 http://dx.doi.org/10.7717/peerj.13525

_version_	1784736400446324736
author	Plancade, Sandra Berland, Magali Blein-Nicolas, Mélisande Langella, Olivier Bassignani, Ariane Juste, Catherine
author_facet	Plancade, Sandra Berland, Magali Blein-Nicolas, Mélisande Langella, Olivier Bassignani, Ariane Juste, Catherine
author_sort	Plancade, Sandra
collection	PubMed
description	One of the difficulties encountered in the statistical analysis of metaproteomics data is the high proportion of missing values, which are usually treated by imputation. Nevertheless, imputation methods are based on restrictive assumptions regarding missingness mechanisms, namely “at random” or “not at random”. To circumvent these limitations in the context of feature selection in a multi-class comparison, we propose a univariate selection method that combines a test of association between missingness and classes, and a test for difference of observed intensities between classes. This approach implicitly handles both missingness mechanisms. We performed a quantitative and qualitative comparison of our procedure with imputation-based feature selection methods on two experimental data sets, as well as simulated data with various scenarios regarding the missingness mechanisms and the nature of the difference of expression (differential intensity or differential presence). Whereas we observed similar performances in terms of prediction on the experimental data set, the feature ranking and selection from various imputation-based methods were strongly divergent. We showed that the combined test reaches a compromise by correlating reasonably with other methods, and remains efficient in all simulated scenarios unlike imputation-based feature selection methods.
format	Online Article Text
id	pubmed-9235818
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-92358182022-06-28 A combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation Plancade, Sandra Berland, Magali Blein-Nicolas, Mélisande Langella, Olivier Bassignani, Ariane Juste, Catherine PeerJ Bioinformatics One of the difficulties encountered in the statistical analysis of metaproteomics data is the high proportion of missing values, which are usually treated by imputation. Nevertheless, imputation methods are based on restrictive assumptions regarding missingness mechanisms, namely “at random” or “not at random”. To circumvent these limitations in the context of feature selection in a multi-class comparison, we propose a univariate selection method that combines a test of association between missingness and classes, and a test for difference of observed intensities between classes. This approach implicitly handles both missingness mechanisms. We performed a quantitative and qualitative comparison of our procedure with imputation-based feature selection methods on two experimental data sets, as well as simulated data with various scenarios regarding the missingness mechanisms and the nature of the difference of expression (differential intensity or differential presence). Whereas we observed similar performances in terms of prediction on the experimental data set, the feature ranking and selection from various imputation-based methods were strongly divergent. We showed that the combined test reaches a compromise by correlating reasonably with other methods, and remains efficient in all simulated scenarios unlike imputation-based feature selection methods. PeerJ Inc. 2022-06-24 /pmc/articles/PMC9235818/ /pubmed/35769140 http://dx.doi.org/10.7717/peerj.13525 Text en © 2022 Plancade et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Plancade, Sandra Berland, Magali Blein-Nicolas, Mélisande Langella, Olivier Bassignani, Ariane Juste, Catherine A combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation
title	A combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation
title_full	A combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation
title_fullStr	A combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation
title_full_unstemmed	A combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation
title_short	A combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation
title_sort	combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235818/ https://www.ncbi.nlm.nih.gov/pubmed/35769140 http://dx.doi.org/10.7717/peerj.13525
work_keys_str_mv	AT plancadesandra acombinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT berlandmagali acombinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT bleinnicolasmelisande acombinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT langellaolivier acombinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT bassignaniariane acombinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT justecatherine acombinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT plancadesandra combinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT berlandmagali combinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT bleinnicolasmelisande combinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT langellaolivier combinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT bassignaniariane combinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation AT justecatherine combinedtestforfeatureselectiononsparsemetaproteomicsdataanalternativetomissingvalueimputation

A combined test for feature selection on sparse metaproteomics data—an alternative to missing value imputation

Ejemplares similares