Cargando…

Robust classification using average correlations as features (ACF)

MOTIVATION: In single-cell transcriptomics and other omics technologies, large fractions of missing values commonly occur. Researchers often either consider only those features that were measured for each instance of their dataset, thereby accepting severe loss of information, or use imputation whic...

Descripción completa

Detalles Bibliográficos
Autores principales: Schumann, Yannis, Neumann, Julia E., Neumann, Philipp
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10026437/
https://www.ncbi.nlm.nih.gov/pubmed/36941542
http://dx.doi.org/10.1186/s12859-023-05224-0
Descripción
Sumario:MOTIVATION: In single-cell transcriptomics and other omics technologies, large fractions of missing values commonly occur. Researchers often either consider only those features that were measured for each instance of their dataset, thereby accepting severe loss of information, or use imputation which can lead to erroneous results. Pairwise metrics allow for imputation-free classification with minimal loss of data. RESULTS: Using pairwise correlations as metric, state-of-the-art approaches to classification would include the K-nearest-neighbor- (KNN) and distribution-based-classification-classifier. Our novel method, termed average correlations as features (ACF), significantly outperforms those approaches by training tunable machine learning models on inter-class and intra-class correlations. Our approach is characterized in simulation studies and its classification performance is demonstrated on real-world datasets from single-cell RNA sequencing and bottom-up proteomics. Furthermore, we demonstrate that variants of our method offer superior flexibility and performance over KNN classifiers and can be used in conjunction with other machine learning methods. In summary, ACF is a flexible method that enables missing value tolerant classification with minimal loss of data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05224-0.