Cargando…

A t-SNE Based Classification Approach to Compositional Microbiome Data

As a data-driven dimensionality reduction and visualization tool, t-distributed stochastic neighborhood embedding (t-SNE) has been successfully applied to a variety of fields. In recent years, it has also received increasing attention for classification and regression analysis. This study presented...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Xueli, Xie, Zhongming, Yang, Zhenyu, Li, Dongfang, Xu, Ximing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7767995/
https://www.ncbi.nlm.nih.gov/pubmed/33381156
http://dx.doi.org/10.3389/fgene.2020.620143
_version_ 1783629084918546432
author Xu, Xueli
Xie, Zhongming
Yang, Zhenyu
Li, Dongfang
Xu, Ximing
author_facet Xu, Xueli
Xie, Zhongming
Yang, Zhenyu
Li, Dongfang
Xu, Ximing
author_sort Xu, Xueli
collection PubMed
description As a data-driven dimensionality reduction and visualization tool, t-distributed stochastic neighborhood embedding (t-SNE) has been successfully applied to a variety of fields. In recent years, it has also received increasing attention for classification and regression analysis. This study presented a t-SNE based classification approach for compositional microbiome data, which enabled us to build classifiers and classify new samples in the reduced dimensional space produced by t-SNE. The Aitchison distance was employed to modify the conditional probabilities in t-SNE to account for the compositionality of microbiome data. To classify a new sample, its low-dimensional features were obtained as the weighted mean vector of its nearest neighbors in the training set. Using the low-dimensional features as input, three commonly used machine learning algorithms, logistic regression (LR), support vector machine (SVM), and decision tree (DT) were considered for classification tasks in this study. The proposed approach was applied to two disease-associated microbiome datasets, achieving better classification performance compared with the classifiers built in the original high-dimensional space. The analytic results also showed that t-SNE with Aitchison distance led to improvement of classification accuracy in both datasets. In conclusion, we have developed a t-SNE based classification approach that is suitable for compositional microbiome data and may also serve as a baseline for more complex classification models.
format Online
Article
Text
id pubmed-7767995
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-77679952020-12-29 A t-SNE Based Classification Approach to Compositional Microbiome Data Xu, Xueli Xie, Zhongming Yang, Zhenyu Li, Dongfang Xu, Ximing Front Genet Genetics As a data-driven dimensionality reduction and visualization tool, t-distributed stochastic neighborhood embedding (t-SNE) has been successfully applied to a variety of fields. In recent years, it has also received increasing attention for classification and regression analysis. This study presented a t-SNE based classification approach for compositional microbiome data, which enabled us to build classifiers and classify new samples in the reduced dimensional space produced by t-SNE. The Aitchison distance was employed to modify the conditional probabilities in t-SNE to account for the compositionality of microbiome data. To classify a new sample, its low-dimensional features were obtained as the weighted mean vector of its nearest neighbors in the training set. Using the low-dimensional features as input, three commonly used machine learning algorithms, logistic regression (LR), support vector machine (SVM), and decision tree (DT) were considered for classification tasks in this study. The proposed approach was applied to two disease-associated microbiome datasets, achieving better classification performance compared with the classifiers built in the original high-dimensional space. The analytic results also showed that t-SNE with Aitchison distance led to improvement of classification accuracy in both datasets. In conclusion, we have developed a t-SNE based classification approach that is suitable for compositional microbiome data and may also serve as a baseline for more complex classification models. Frontiers Media S.A. 2020-12-14 /pmc/articles/PMC7767995/ /pubmed/33381156 http://dx.doi.org/10.3389/fgene.2020.620143 Text en Copyright © 2020 Xu, Xie, Yang, Li and Xu. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Xu, Xueli
Xie, Zhongming
Yang, Zhenyu
Li, Dongfang
Xu, Ximing
A t-SNE Based Classification Approach to Compositional Microbiome Data
title A t-SNE Based Classification Approach to Compositional Microbiome Data
title_full A t-SNE Based Classification Approach to Compositional Microbiome Data
title_fullStr A t-SNE Based Classification Approach to Compositional Microbiome Data
title_full_unstemmed A t-SNE Based Classification Approach to Compositional Microbiome Data
title_short A t-SNE Based Classification Approach to Compositional Microbiome Data
title_sort t-sne based classification approach to compositional microbiome data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7767995/
https://www.ncbi.nlm.nih.gov/pubmed/33381156
http://dx.doi.org/10.3389/fgene.2020.620143
work_keys_str_mv AT xuxueli atsnebasedclassificationapproachtocompositionalmicrobiomedata
AT xiezhongming atsnebasedclassificationapproachtocompositionalmicrobiomedata
AT yangzhenyu atsnebasedclassificationapproachtocompositionalmicrobiomedata
AT lidongfang atsnebasedclassificationapproachtocompositionalmicrobiomedata
AT xuximing atsnebasedclassificationapproachtocompositionalmicrobiomedata
AT xuxueli tsnebasedclassificationapproachtocompositionalmicrobiomedata
AT xiezhongming tsnebasedclassificationapproachtocompositionalmicrobiomedata
AT yangzhenyu tsnebasedclassificationapproachtocompositionalmicrobiomedata
AT lidongfang tsnebasedclassificationapproachtocompositionalmicrobiomedata
AT xuximing tsnebasedclassificationapproachtocompositionalmicrobiomedata