Cargando…

Type 2 Diabetes Biomarkers of Human Gut Microbiota Selected via Iterative Sure Independent Screening Method

Type 2 diabetes, which is a complex metabolic disease influenced by genetic and environment, has become a worldwide problem. Previous published results focused on genetic components through genome-wide association studies that just interpret this disease to some extent. Recently, two research groups...

Descripción completa

Detalles Bibliográficos
Autores principales: Cai, Lihua, Wu, Honglong, Li, Dongfang, Zhou, Ke, Zou, Fuhao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4610706/
https://www.ncbi.nlm.nih.gov/pubmed/26479726
http://dx.doi.org/10.1371/journal.pone.0140827
Descripción
Sumario:Type 2 diabetes, which is a complex metabolic disease influenced by genetic and environment, has become a worldwide problem. Previous published results focused on genetic components through genome-wide association studies that just interpret this disease to some extent. Recently, two research groups published metagenome-wide association studies (MGWAS) result that found meta-biomarkers related with type 2 diabetes. However, One key problem of analyzing genomic data is that how to deal with the ultra-high dimensionality of features. From a statistical viewpoint it is challenging to filter true factors in high dimensional data. Various methods and techniques have been proposed on this issue, which can only achieve limited prediction performance and poor interpretability. New statistical procedure with higher performance and clear interpretability is appealing in analyzing high dimensional data. To address this problem, we apply an excellent statistical variable selection procedure called iterative sure independence screening to gene profiles that obtained from metagenome sequencing, and 48/24 meta-markers were selected in Chinese/European cohorts as predictors with 0.97/0.99 accuracy in AUC (area under the curve), which showed a better performance than other model selection methods, respectively. These results demonstrate the power and utility of data mining technologies within the large-scale and ultra-high dimensional genomic-related dataset for diagnostic and predictive markers identifying.