Cargando…

Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data

Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has...

Descripción completa

Detalles Bibliográficos
Autores principales: Quinn, Thomas P, Erb, Ionas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671324/
https://www.ncbi.nlm.nih.gov/pubmed/33575624
http://dx.doi.org/10.1093/nargab/lqaa076
_version_ 1783610908051767296
author Quinn, Thomas P
Erb, Ionas
author_facet Quinn, Thomas P
Erb, Ionas
author_sort Quinn, Thomas P
collection PubMed
description Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together.
format Online
Article
Text
id pubmed-7671324
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76713242021-02-10 Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data Quinn, Thomas P Erb, Ionas NAR Genom Bioinform Standard Article Many next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together. Oxford University Press 2020-10-02 /pmc/articles/PMC7671324/ /pubmed/33575624 http://dx.doi.org/10.1093/nargab/lqaa076 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Quinn, Thomas P
Erb, Ionas
Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data
title Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data
title_full Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data
title_fullStr Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data
title_full_unstemmed Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data
title_short Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data
title_sort amalgams: data-driven amalgamation for the dimensionality reduction of compositional data
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671324/
https://www.ncbi.nlm.nih.gov/pubmed/33575624
http://dx.doi.org/10.1093/nargab/lqaa076
work_keys_str_mv AT quinnthomasp amalgamsdatadrivenamalgamationforthedimensionalityreductionofcompositionaldata
AT erbionas amalgamsdatadrivenamalgamationforthedimensionalityreductionofcompositionaldata