Cargando…
A t-SNE Based Classification Approach to Compositional Microbiome Data
As a data-driven dimensionality reduction and visualization tool, t-distributed stochastic neighborhood embedding (t-SNE) has been successfully applied to a variety of fields. In recent years, it has also received increasing attention for classification and regression analysis. This study presented...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7767995/ https://www.ncbi.nlm.nih.gov/pubmed/33381156 http://dx.doi.org/10.3389/fgene.2020.620143 |
_version_ | 1783629084918546432 |
---|---|
author | Xu, Xueli Xie, Zhongming Yang, Zhenyu Li, Dongfang Xu, Ximing |
author_facet | Xu, Xueli Xie, Zhongming Yang, Zhenyu Li, Dongfang Xu, Ximing |
author_sort | Xu, Xueli |
collection | PubMed |
description | As a data-driven dimensionality reduction and visualization tool, t-distributed stochastic neighborhood embedding (t-SNE) has been successfully applied to a variety of fields. In recent years, it has also received increasing attention for classification and regression analysis. This study presented a t-SNE based classification approach for compositional microbiome data, which enabled us to build classifiers and classify new samples in the reduced dimensional space produced by t-SNE. The Aitchison distance was employed to modify the conditional probabilities in t-SNE to account for the compositionality of microbiome data. To classify a new sample, its low-dimensional features were obtained as the weighted mean vector of its nearest neighbors in the training set. Using the low-dimensional features as input, three commonly used machine learning algorithms, logistic regression (LR), support vector machine (SVM), and decision tree (DT) were considered for classification tasks in this study. The proposed approach was applied to two disease-associated microbiome datasets, achieving better classification performance compared with the classifiers built in the original high-dimensional space. The analytic results also showed that t-SNE with Aitchison distance led to improvement of classification accuracy in both datasets. In conclusion, we have developed a t-SNE based classification approach that is suitable for compositional microbiome data and may also serve as a baseline for more complex classification models. |
format | Online Article Text |
id | pubmed-7767995 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-77679952020-12-29 A t-SNE Based Classification Approach to Compositional Microbiome Data Xu, Xueli Xie, Zhongming Yang, Zhenyu Li, Dongfang Xu, Ximing Front Genet Genetics As a data-driven dimensionality reduction and visualization tool, t-distributed stochastic neighborhood embedding (t-SNE) has been successfully applied to a variety of fields. In recent years, it has also received increasing attention for classification and regression analysis. This study presented a t-SNE based classification approach for compositional microbiome data, which enabled us to build classifiers and classify new samples in the reduced dimensional space produced by t-SNE. The Aitchison distance was employed to modify the conditional probabilities in t-SNE to account for the compositionality of microbiome data. To classify a new sample, its low-dimensional features were obtained as the weighted mean vector of its nearest neighbors in the training set. Using the low-dimensional features as input, three commonly used machine learning algorithms, logistic regression (LR), support vector machine (SVM), and decision tree (DT) were considered for classification tasks in this study. The proposed approach was applied to two disease-associated microbiome datasets, achieving better classification performance compared with the classifiers built in the original high-dimensional space. The analytic results also showed that t-SNE with Aitchison distance led to improvement of classification accuracy in both datasets. In conclusion, we have developed a t-SNE based classification approach that is suitable for compositional microbiome data and may also serve as a baseline for more complex classification models. Frontiers Media S.A. 2020-12-14 /pmc/articles/PMC7767995/ /pubmed/33381156 http://dx.doi.org/10.3389/fgene.2020.620143 Text en Copyright © 2020 Xu, Xie, Yang, Li and Xu. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Xu, Xueli Xie, Zhongming Yang, Zhenyu Li, Dongfang Xu, Ximing A t-SNE Based Classification Approach to Compositional Microbiome Data |
title | A t-SNE Based Classification Approach to Compositional Microbiome Data |
title_full | A t-SNE Based Classification Approach to Compositional Microbiome Data |
title_fullStr | A t-SNE Based Classification Approach to Compositional Microbiome Data |
title_full_unstemmed | A t-SNE Based Classification Approach to Compositional Microbiome Data |
title_short | A t-SNE Based Classification Approach to Compositional Microbiome Data |
title_sort | t-sne based classification approach to compositional microbiome data |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7767995/ https://www.ncbi.nlm.nih.gov/pubmed/33381156 http://dx.doi.org/10.3389/fgene.2020.620143 |
work_keys_str_mv | AT xuxueli atsnebasedclassificationapproachtocompositionalmicrobiomedata AT xiezhongming atsnebasedclassificationapproachtocompositionalmicrobiomedata AT yangzhenyu atsnebasedclassificationapproachtocompositionalmicrobiomedata AT lidongfang atsnebasedclassificationapproachtocompositionalmicrobiomedata AT xuximing atsnebasedclassificationapproachtocompositionalmicrobiomedata AT xuxueli tsnebasedclassificationapproachtocompositionalmicrobiomedata AT xiezhongming tsnebasedclassificationapproachtocompositionalmicrobiomedata AT yangzhenyu tsnebasedclassificationapproachtocompositionalmicrobiomedata AT lidongfang tsnebasedclassificationapproachtocompositionalmicrobiomedata AT xuximing tsnebasedclassificationapproachtocompositionalmicrobiomedata |