Cargando…

Optimizing gene set annotations combining GO structure and gene expression data

BACKGROUND: With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enri...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Dong, Li, Jie, Liu, Rui, Wang, Yadong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311910/
https://www.ncbi.nlm.nih.gov/pubmed/30598093
http://dx.doi.org/10.1186/s12918-018-0659-6
Descripción
Sumario:BACKGROUND: With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data. RESULTS: We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations. CONCLUSIONS: A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.