Cargando…

Band-based similarity indices for gene expression classification and clustering

The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene...

Descripción completa

Detalles Bibliográficos
Autor principal: Torrente, Aurora
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8566472/
https://www.ncbi.nlm.nih.gov/pubmed/34732744
http://dx.doi.org/10.1038/s41598-021-00678-9
_version_ 1784594019811065856
author Torrente, Aurora
author_facet Torrente, Aurora
author_sort Torrente, Aurora
collection PubMed
description The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance.
format Online
Article
Text
id pubmed-8566472
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-85664722021-11-04 Band-based similarity indices for gene expression classification and clustering Torrente, Aurora Sci Rep Article The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance. Nature Publishing Group UK 2021-11-03 /pmc/articles/PMC8566472/ /pubmed/34732744 http://dx.doi.org/10.1038/s41598-021-00678-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Torrente, Aurora
Band-based similarity indices for gene expression classification and clustering
title Band-based similarity indices for gene expression classification and clustering
title_full Band-based similarity indices for gene expression classification and clustering
title_fullStr Band-based similarity indices for gene expression classification and clustering
title_full_unstemmed Band-based similarity indices for gene expression classification and clustering
title_short Band-based similarity indices for gene expression classification and clustering
title_sort band-based similarity indices for gene expression classification and clustering
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8566472/
https://www.ncbi.nlm.nih.gov/pubmed/34732744
http://dx.doi.org/10.1038/s41598-021-00678-9
work_keys_str_mv AT torrenteaurora bandbasedsimilarityindicesforgeneexpressionclassificationandclustering