Cargando…

TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification

MOTIVATION: Learning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, in comparison to other fields, microbiome data are high-dimensi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sayyari, Erfan, Kawas, Ban, Mirarab, Siavash
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Ismb/Eccb 2019 Conference Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612822/ https://www.ncbi.nlm.nih.gov/pubmed/31510701 http://dx.doi.org/10.1093/bioinformatics/btz394

_version_	1783432944706125824
author	Sayyari, Erfan Kawas, Ban Mirarab, Siavash
author_facet	Sayyari, Erfan Kawas, Ban Mirarab, Siavash
author_sort	Sayyari, Erfan
collection	PubMed
description	MOTIVATION: Learning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, in comparison to other fields, microbiome data are high-dimensional and not abundant; leading to a high-dimensional low-sample-size under-determined system. Moreover, microbiome data are often unbalanced and biased. Given such training data, machine learning methods often fail to perform a classification task with sufficient accuracy. Lack of signal is especially problematic when classes are represented in an unbalanced way in the training data; with some classes under-represented. The presence of inter-correlations among subsets of observations further compounds these issues. As a result, machine learning methods have had only limited success in predicting many traits from microbiome. Data augmentation consists of building synthetic samples and adding them to the training data and is a technique that has proved helpful for many machine learning tasks. RESULTS: In this paper, we propose a new data augmentation technique for classifying phenotypes based on the microbiome. Our algorithm, called TADA, uses available data and a statistical generative model to create new samples augmenting existing ones, addressing issues of low-sample-size. In generating new samples, TADA takes into account phylogenetic relationships between microbial species. On two real datasets, we show that adding these synthetic samples to the training set improves the accuracy of downstream classification, especially when the training data have an unbalanced representation of classes. AVAILABILITY AND IMPLEMENTATION: TADA is available at https://github.com/tada-alg/TADA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6612822
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-66128222019-07-12 TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification Sayyari, Erfan Kawas, Ban Mirarab, Siavash Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Learning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, in comparison to other fields, microbiome data are high-dimensional and not abundant; leading to a high-dimensional low-sample-size under-determined system. Moreover, microbiome data are often unbalanced and biased. Given such training data, machine learning methods often fail to perform a classification task with sufficient accuracy. Lack of signal is especially problematic when classes are represented in an unbalanced way in the training data; with some classes under-represented. The presence of inter-correlations among subsets of observations further compounds these issues. As a result, machine learning methods have had only limited success in predicting many traits from microbiome. Data augmentation consists of building synthetic samples and adding them to the training data and is a technique that has proved helpful for many machine learning tasks. RESULTS: In this paper, we propose a new data augmentation technique for classifying phenotypes based on the microbiome. Our algorithm, called TADA, uses available data and a statistical generative model to create new samples augmenting existing ones, addressing issues of low-sample-size. In generating new samples, TADA takes into account phylogenetic relationships between microbial species. On two real datasets, we show that adding these synthetic samples to the training set improves the accuracy of downstream classification, especially when the training data have an unbalanced representation of classes. AVAILABILITY AND IMPLEMENTATION: TADA is available at https://github.com/tada-alg/TADA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612822/ /pubmed/31510701 http://dx.doi.org/10.1093/bioinformatics/btz394 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb/Eccb 2019 Conference Proceedings Sayyari, Erfan Kawas, Ban Mirarab, Siavash TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification
title	TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification
title_full	TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification
title_fullStr	TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification
title_full_unstemmed	TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification
title_short	TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification
title_sort	tada: phylogenetic augmentation of microbiome samples enhances phenotype classification
topic	Ismb/Eccb 2019 Conference Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612822/ https://www.ncbi.nlm.nih.gov/pubmed/31510701 http://dx.doi.org/10.1093/bioinformatics/btz394
work_keys_str_mv	AT sayyarierfan tadaphylogeneticaugmentationofmicrobiomesamplesenhancesphenotypeclassification AT kawasban tadaphylogeneticaugmentationofmicrobiomesamplesenhancesphenotypeclassification AT mirarabsiavash tadaphylogeneticaugmentationofmicrobiomesamplesenhancesphenotypeclassification

TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification

Ejemplares similares