Cargando…

Variance Component Selection With Applications to Microbiome Taxonomic Data

High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify t...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhai, Jing, Kim, Juhyun, Knox, Kenneth S., Twigg, Homer L., Zhou, Hua, Zhou, Jin J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5883493/
https://www.ncbi.nlm.nih.gov/pubmed/29643839
http://dx.doi.org/10.3389/fmicb.2018.00509
_version_ 1783311665147674624
author Zhai, Jing
Kim, Juhyun
Knox, Kenneth S.
Twigg, Homer L.
Zhou, Hua
Zhou, Jin J.
author_facet Zhai, Jing
Kim, Juhyun
Knox, Kenneth S.
Twigg, Homer L.
Zhou, Hua
Zhou, Jin J.
author_sort Zhai, Jing
collection PubMed
description High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection.
format Online
Article
Text
id pubmed-5883493
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-58834932018-04-11 Variance Component Selection With Applications to Microbiome Taxonomic Data Zhai, Jing Kim, Juhyun Knox, Kenneth S. Twigg, Homer L. Zhou, Hua Zhou, Jin J. Front Microbiol Microbiology High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection. Frontiers Media S.A. 2018-03-28 /pmc/articles/PMC5883493/ /pubmed/29643839 http://dx.doi.org/10.3389/fmicb.2018.00509 Text en Copyright © 2018 Zhai, Kim, Knox, Twigg, Zhou and Zhou. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Zhai, Jing
Kim, Juhyun
Knox, Kenneth S.
Twigg, Homer L.
Zhou, Hua
Zhou, Jin J.
Variance Component Selection With Applications to Microbiome Taxonomic Data
title Variance Component Selection With Applications to Microbiome Taxonomic Data
title_full Variance Component Selection With Applications to Microbiome Taxonomic Data
title_fullStr Variance Component Selection With Applications to Microbiome Taxonomic Data
title_full_unstemmed Variance Component Selection With Applications to Microbiome Taxonomic Data
title_short Variance Component Selection With Applications to Microbiome Taxonomic Data
title_sort variance component selection with applications to microbiome taxonomic data
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5883493/
https://www.ncbi.nlm.nih.gov/pubmed/29643839
http://dx.doi.org/10.3389/fmicb.2018.00509
work_keys_str_mv AT zhaijing variancecomponentselectionwithapplicationstomicrobiometaxonomicdata
AT kimjuhyun variancecomponentselectionwithapplicationstomicrobiometaxonomicdata
AT knoxkenneths variancecomponentselectionwithapplicationstomicrobiometaxonomicdata
AT twigghomerl variancecomponentselectionwithapplicationstomicrobiometaxonomicdata
AT zhouhua variancecomponentselectionwithapplicationstomicrobiometaxonomicdata
AT zhoujinj variancecomponentselectionwithapplicationstomicrobiometaxonomicdata