Cargando…

Missing value imputation for microRNA expression data by using a GO-based similarity measure

BACKGROUND: Missing values are commonly present in microarray data profiles. Instead of discarding genes or samples with incomplete expression level, missing values need to be properly imputed for accurate data analysis. The imputation methods can be roughly categorized as expression level-based and...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Yang, Xu, Zhuangdi, Song, Dandan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895707/
https://www.ncbi.nlm.nih.gov/pubmed/26818962
http://dx.doi.org/10.1186/s12859-015-0853-0
_version_ 1782435905900904448
author Yang, Yang
Xu, Zhuangdi
Song, Dandan
author_facet Yang, Yang
Xu, Zhuangdi
Song, Dandan
author_sort Yang, Yang
collection PubMed
description BACKGROUND: Missing values are commonly present in microarray data profiles. Instead of discarding genes or samples with incomplete expression level, missing values need to be properly imputed for accurate data analysis. The imputation methods can be roughly categorized as expression level-based and domain knowledge-based. The first type of methods only rely on expression data without the help of external data sources, while the second type incorporates available domain knowledge into expression data to improve imputation accuracy. In recent years, microRNA (miRNA) microarray has been largely developed and used for identifying miRNA biomarkers in complex human disease studies. Similar to mRNA profiles, miRNA expression profiles with missing values can be treated with the existing imputation methods. However, the domain knowledge-based methods are hard to be applied due to the lack of direct functional annotation for miRNAs. With the rapid accumulation of miRNA microarray data, it is increasingly needed to develop domain knowledge-based imputation algorithms specific to miRNA expression profiles to improve the quality of miRNA data analysis. RESULTS: We connect miRNAs with domain knowledge of Gene Ontology (GO) via their target genes, and define miRNA functional similarity based on the semantic similarity of GO terms in GO graphs. A new measure combining miRNA functional similarity and expression similarity is used in the imputation of missing values. The new measure is tested on two miRNA microarray datasets from breast cancer research and achieves improved performance compared with the expression-based method on both datasets. CONCLUSIONS: The experimental results demonstrate that the biological domain knowledge can benefit the estimation of missing values in miRNA profiles as well as mRNA profiles. Especially, functional similarity defined by GO terms annotated for the target genes of miRNAs can be useful complementary information for the expression-based method to improve the imputation accuracy of miRNA array data. Our method and data are available to the public upon request.
format Online
Article
Text
id pubmed-4895707
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48957072016-06-10 Missing value imputation for microRNA expression data by using a GO-based similarity measure Yang, Yang Xu, Zhuangdi Song, Dandan BMC Bioinformatics Proceedings BACKGROUND: Missing values are commonly present in microarray data profiles. Instead of discarding genes or samples with incomplete expression level, missing values need to be properly imputed for accurate data analysis. The imputation methods can be roughly categorized as expression level-based and domain knowledge-based. The first type of methods only rely on expression data without the help of external data sources, while the second type incorporates available domain knowledge into expression data to improve imputation accuracy. In recent years, microRNA (miRNA) microarray has been largely developed and used for identifying miRNA biomarkers in complex human disease studies. Similar to mRNA profiles, miRNA expression profiles with missing values can be treated with the existing imputation methods. However, the domain knowledge-based methods are hard to be applied due to the lack of direct functional annotation for miRNAs. With the rapid accumulation of miRNA microarray data, it is increasingly needed to develop domain knowledge-based imputation algorithms specific to miRNA expression profiles to improve the quality of miRNA data analysis. RESULTS: We connect miRNAs with domain knowledge of Gene Ontology (GO) via their target genes, and define miRNA functional similarity based on the semantic similarity of GO terms in GO graphs. A new measure combining miRNA functional similarity and expression similarity is used in the imputation of missing values. The new measure is tested on two miRNA microarray datasets from breast cancer research and achieves improved performance compared with the expression-based method on both datasets. CONCLUSIONS: The experimental results demonstrate that the biological domain knowledge can benefit the estimation of missing values in miRNA profiles as well as mRNA profiles. Especially, functional similarity defined by GO terms annotated for the target genes of miRNAs can be useful complementary information for the expression-based method to improve the imputation accuracy of miRNA array data. Our method and data are available to the public upon request. BioMed Central 2016-01-11 /pmc/articles/PMC4895707/ /pubmed/26818962 http://dx.doi.org/10.1186/s12859-015-0853-0 Text en © Yang et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Yang, Yang
Xu, Zhuangdi
Song, Dandan
Missing value imputation for microRNA expression data by using a GO-based similarity measure
title Missing value imputation for microRNA expression data by using a GO-based similarity measure
title_full Missing value imputation for microRNA expression data by using a GO-based similarity measure
title_fullStr Missing value imputation for microRNA expression data by using a GO-based similarity measure
title_full_unstemmed Missing value imputation for microRNA expression data by using a GO-based similarity measure
title_short Missing value imputation for microRNA expression data by using a GO-based similarity measure
title_sort missing value imputation for microrna expression data by using a go-based similarity measure
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895707/
https://www.ncbi.nlm.nih.gov/pubmed/26818962
http://dx.doi.org/10.1186/s12859-015-0853-0
work_keys_str_mv AT yangyang missingvalueimputationformicrornaexpressiondatabyusingagobasedsimilaritymeasure
AT xuzhuangdi missingvalueimputationformicrornaexpressiondatabyusingagobasedsimilaritymeasure
AT songdandan missingvalueimputationformicrornaexpressiondatabyusingagobasedsimilaritymeasure