Cargando…

Optimizing gene set annotations combining GO structure and gene expression data

BACKGROUND: With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enri...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Dong, Li, Jie, Liu, Rui, Wang, Yadong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311910/
https://www.ncbi.nlm.nih.gov/pubmed/30598093
http://dx.doi.org/10.1186/s12918-018-0659-6
_version_ 1783383699236061184
author Wang, Dong
Li, Jie
Liu, Rui
Wang, Yadong
author_facet Wang, Dong
Li, Jie
Liu, Rui
Wang, Yadong
author_sort Wang, Dong
collection PubMed
description BACKGROUND: With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data. RESULTS: We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations. CONCLUSIONS: A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.
format Online
Article
Text
id pubmed-6311910
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63119102019-01-07 Optimizing gene set annotations combining GO structure and gene expression data Wang, Dong Li, Jie Liu, Rui Wang, Yadong BMC Syst Biol Research BACKGROUND: With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data. RESULTS: We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations. CONCLUSIONS: A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data. BioMed Central 2018-12-31 /pmc/articles/PMC6311910/ /pubmed/30598093 http://dx.doi.org/10.1186/s12918-018-0659-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Dong
Li, Jie
Liu, Rui
Wang, Yadong
Optimizing gene set annotations combining GO structure and gene expression data
title Optimizing gene set annotations combining GO structure and gene expression data
title_full Optimizing gene set annotations combining GO structure and gene expression data
title_fullStr Optimizing gene set annotations combining GO structure and gene expression data
title_full_unstemmed Optimizing gene set annotations combining GO structure and gene expression data
title_short Optimizing gene set annotations combining GO structure and gene expression data
title_sort optimizing gene set annotations combining go structure and gene expression data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311910/
https://www.ncbi.nlm.nih.gov/pubmed/30598093
http://dx.doi.org/10.1186/s12918-018-0659-6
work_keys_str_mv AT wangdong optimizinggenesetannotationscombininggostructureandgeneexpressiondata
AT lijie optimizinggenesetannotationscombininggostructureandgeneexpressiondata
AT liurui optimizinggenesetannotationscombininggostructureandgeneexpressiondata
AT wangyadong optimizinggenesetannotationscombininggostructureandgeneexpressiondata