Cargando…

Gene Expression Value Prediction Based on XGBoost Algorithm

Gene expression profiling has been widely used to characterize cell status to reflect the health of the body, to diagnose genetic diseases, etc. In recent years, although the cost of genome-wide expression profiling is gradually decreasing, the cost of collecting expression profiles for thousands of...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Wei, Yin, Yanbin, Quan, Xiongwen, Zhang, Han
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6861218/
https://www.ncbi.nlm.nih.gov/pubmed/31781160
http://dx.doi.org/10.3389/fgene.2019.01077
_version_ 1783471304923414528
author Li, Wei
Yin, Yanbin
Quan, Xiongwen
Zhang, Han
author_facet Li, Wei
Yin, Yanbin
Quan, Xiongwen
Zhang, Han
author_sort Li, Wei
collection PubMed
description Gene expression profiling has been widely used to characterize cell status to reflect the health of the body, to diagnose genetic diseases, etc. In recent years, although the cost of genome-wide expression profiling is gradually decreasing, the cost of collecting expression profiles for thousands of genes is still very high. Considering gene expressions are usually highly correlated in humans, the expression values of the remaining target genes can be predicted by analyzing the values of 943 landmark genes. Hence, we designed an algorithm for predicting gene expression values based on XGBoost, which integrates multiple tree models and has stronger interpretability. We tested the performance of XGBoost model on the GEO dataset and RNA-seq dataset and compared the result with other existing models. Experiments showed that the XGBoost model achieved a significantly lower overall error than the existing D-GEX algorithm, linear regression, and KNN methods. In conclusion, the XGBoost algorithm outperforms existing models and will be a significant contribution to the toolbox for gene expression value prediction.
format Online
Article
Text
id pubmed-6861218
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-68612182019-11-28 Gene Expression Value Prediction Based on XGBoost Algorithm Li, Wei Yin, Yanbin Quan, Xiongwen Zhang, Han Front Genet Genetics Gene expression profiling has been widely used to characterize cell status to reflect the health of the body, to diagnose genetic diseases, etc. In recent years, although the cost of genome-wide expression profiling is gradually decreasing, the cost of collecting expression profiles for thousands of genes is still very high. Considering gene expressions are usually highly correlated in humans, the expression values of the remaining target genes can be predicted by analyzing the values of 943 landmark genes. Hence, we designed an algorithm for predicting gene expression values based on XGBoost, which integrates multiple tree models and has stronger interpretability. We tested the performance of XGBoost model on the GEO dataset and RNA-seq dataset and compared the result with other existing models. Experiments showed that the XGBoost model achieved a significantly lower overall error than the existing D-GEX algorithm, linear regression, and KNN methods. In conclusion, the XGBoost algorithm outperforms existing models and will be a significant contribution to the toolbox for gene expression value prediction. Frontiers Media S.A. 2019-11-12 /pmc/articles/PMC6861218/ /pubmed/31781160 http://dx.doi.org/10.3389/fgene.2019.01077 Text en Copyright © 2019 Li, Yin, Quan and Zhang http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Li, Wei
Yin, Yanbin
Quan, Xiongwen
Zhang, Han
Gene Expression Value Prediction Based on XGBoost Algorithm
title Gene Expression Value Prediction Based on XGBoost Algorithm
title_full Gene Expression Value Prediction Based on XGBoost Algorithm
title_fullStr Gene Expression Value Prediction Based on XGBoost Algorithm
title_full_unstemmed Gene Expression Value Prediction Based on XGBoost Algorithm
title_short Gene Expression Value Prediction Based on XGBoost Algorithm
title_sort gene expression value prediction based on xgboost algorithm
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6861218/
https://www.ncbi.nlm.nih.gov/pubmed/31781160
http://dx.doi.org/10.3389/fgene.2019.01077
work_keys_str_mv AT liwei geneexpressionvaluepredictionbasedonxgboostalgorithm
AT yinyanbin geneexpressionvaluepredictionbasedonxgboostalgorithm
AT quanxiongwen geneexpressionvaluepredictionbasedonxgboostalgorithm
AT zhanghan geneexpressionvaluepredictionbasedonxgboostalgorithm