Cargando…
Optimizing gene set annotations combining GO structure and gene expression data
BACKGROUND: With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enri...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311910/ https://www.ncbi.nlm.nih.gov/pubmed/30598093 http://dx.doi.org/10.1186/s12918-018-0659-6 |
_version_ | 1783383699236061184 |
---|---|
author | Wang, Dong Li, Jie Liu, Rui Wang, Yadong |
author_facet | Wang, Dong Li, Jie Liu, Rui Wang, Yadong |
author_sort | Wang, Dong |
collection | PubMed |
description | BACKGROUND: With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data. RESULTS: We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations. CONCLUSIONS: A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data. |
format | Online Article Text |
id | pubmed-6311910 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63119102019-01-07 Optimizing gene set annotations combining GO structure and gene expression data Wang, Dong Li, Jie Liu, Rui Wang, Yadong BMC Syst Biol Research BACKGROUND: With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data. RESULTS: We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations. CONCLUSIONS: A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data. BioMed Central 2018-12-31 /pmc/articles/PMC6311910/ /pubmed/30598093 http://dx.doi.org/10.1186/s12918-018-0659-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Wang, Dong Li, Jie Liu, Rui Wang, Yadong Optimizing gene set annotations combining GO structure and gene expression data |
title | Optimizing gene set annotations combining GO structure and gene expression data |
title_full | Optimizing gene set annotations combining GO structure and gene expression data |
title_fullStr | Optimizing gene set annotations combining GO structure and gene expression data |
title_full_unstemmed | Optimizing gene set annotations combining GO structure and gene expression data |
title_short | Optimizing gene set annotations combining GO structure and gene expression data |
title_sort | optimizing gene set annotations combining go structure and gene expression data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311910/ https://www.ncbi.nlm.nih.gov/pubmed/30598093 http://dx.doi.org/10.1186/s12918-018-0659-6 |
work_keys_str_mv | AT wangdong optimizinggenesetannotationscombininggostructureandgeneexpressiondata AT lijie optimizinggenesetannotationscombininggostructureandgeneexpressiondata AT liurui optimizinggenesetannotationscombininggostructureandgeneexpressiondata AT wangyadong optimizinggenesetannotationscombininggostructureandgeneexpressiondata |