Cargando…
Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets
Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in the embedding. We introduce a robust MDS method, called DeCOr-MDS (Detection and Correction of Orthogonal outliers using MDS), bas...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448701/ https://www.ncbi.nlm.nih.gov/pubmed/37637212 http://dx.doi.org/10.3389/fbinf.2023.1211819 |
_version_ | 1785094793384165376 |
---|---|
author | Li, Wanxin Mirone, Jules Prasad, Ashok Miolane, Nina Legrand, Carine Dao Duc, Khanh |
author_facet | Li, Wanxin Mirone, Jules Prasad, Ashok Miolane, Nina Legrand, Carine Dao Duc, Khanh |
author_sort | Li, Wanxin |
collection | PubMed |
description | Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in the embedding. We introduce a robust MDS method, called DeCOr-MDS (Detection and Correction of Orthogonal outliers using MDS), based on the geometry and statistics of simplices formed by data points, that allows to detect orthogonal outliers and subsequently reduce dimensionality. We validate our methods using synthetic datasets, and further show how it can be applied to a variety of large real biological datasets, including cancer image cell data, human microbiome project data and single cell RNA sequencing data, to address the task of data cleaning and visualization. |
format | Online Article Text |
id | pubmed-10448701 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-104487012023-08-25 Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets Li, Wanxin Mirone, Jules Prasad, Ashok Miolane, Nina Legrand, Carine Dao Duc, Khanh Front Bioinform Bioinformatics Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in the embedding. We introduce a robust MDS method, called DeCOr-MDS (Detection and Correction of Orthogonal outliers using MDS), based on the geometry and statistics of simplices formed by data points, that allows to detect orthogonal outliers and subsequently reduce dimensionality. We validate our methods using synthetic datasets, and further show how it can be applied to a variety of large real biological datasets, including cancer image cell data, human microbiome project data and single cell RNA sequencing data, to address the task of data cleaning and visualization. Frontiers Media S.A. 2023-08-10 /pmc/articles/PMC10448701/ /pubmed/37637212 http://dx.doi.org/10.3389/fbinf.2023.1211819 Text en Copyright © 2023 Li, Mirone, Prasad, Miolane, Legrand and Dao Duc. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioinformatics Li, Wanxin Mirone, Jules Prasad, Ashok Miolane, Nina Legrand, Carine Dao Duc, Khanh Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets |
title | Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets |
title_full | Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets |
title_fullStr | Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets |
title_full_unstemmed | Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets |
title_short | Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets |
title_sort | orthogonal outlier detection and dimension estimation for improved mds embedding of biological datasets |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448701/ https://www.ncbi.nlm.nih.gov/pubmed/37637212 http://dx.doi.org/10.3389/fbinf.2023.1211819 |
work_keys_str_mv | AT liwanxin orthogonaloutlierdetectionanddimensionestimationforimprovedmdsembeddingofbiologicaldatasets AT mironejules orthogonaloutlierdetectionanddimensionestimationforimprovedmdsembeddingofbiologicaldatasets AT prasadashok orthogonaloutlierdetectionanddimensionestimationforimprovedmdsembeddingofbiologicaldatasets AT miolanenina orthogonaloutlierdetectionanddimensionestimationforimprovedmdsembeddingofbiologicaldatasets AT legrandcarine orthogonaloutlierdetectionanddimensionestimationforimprovedmdsembeddingofbiologicaldatasets AT daoduckhanh orthogonaloutlierdetectionanddimensionestimationforimprovedmdsembeddingofbiologicaldatasets |