Cargando…
Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases
BACKGROUNDS: A large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751820/ https://www.ncbi.nlm.nih.gov/pubmed/29297358 http://dx.doi.org/10.1186/s12920-017-0310-1 |
_version_ | 1783290026260430848 |
---|---|
author | Biswas, Ashis Kumer Kim, Dongchul Kang, Mingon Ding, Chris Gao, Jean X. |
author_facet | Biswas, Ashis Kumer Kim, Dongchul Kang, Mingon Ding, Chris Gao, Jean X. |
author_sort | Biswas, Ashis Kumer |
collection | PubMed |
description | BACKGROUNDS: A large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However, a plethora lincRNA-data are available now, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc, which opens the opportunity for cutting-edge machine learning and data mining approaches to extract meaningful relationships among lincRNAs and diseases. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of them utilizes side information of both the entities simultaneously in a single framework. METHODS: The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information about them. However, the formulation of IMC is incapable of handling noise and outliers that may be present in the datasets, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve the two issues. As a remedy, in this paper, we propose Stable Robust Inductive Matrix Completion (SRIMC) that utilizes the l (2,1) norm based regularization to optimize the objective function with a unique 2-step stable solution approach. RESULTS: We applied SRIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. The method performs better than the state-of-the-art methods in terms of p r e c i s i o n @ k and r e c a l l @ k at the top-k disease prioritization to the subject lincRNAs. We also demonstrate that SRIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. CONCLUSIONS: With the experimental results and computational evaluation, we show that SRIMC is robust in handling datasets with noise and outliers as well as dealing with novel lincRNAs and disease phenotypes. |
format | Online Article Text |
id | pubmed-5751820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57518202018-01-05 Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases Biswas, Ashis Kumer Kim, Dongchul Kang, Mingon Ding, Chris Gao, Jean X. BMC Med Genomics Research BACKGROUNDS: A large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However, a plethora lincRNA-data are available now, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc, which opens the opportunity for cutting-edge machine learning and data mining approaches to extract meaningful relationships among lincRNAs and diseases. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of them utilizes side information of both the entities simultaneously in a single framework. METHODS: The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information about them. However, the formulation of IMC is incapable of handling noise and outliers that may be present in the datasets, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve the two issues. As a remedy, in this paper, we propose Stable Robust Inductive Matrix Completion (SRIMC) that utilizes the l (2,1) norm based regularization to optimize the objective function with a unique 2-step stable solution approach. RESULTS: We applied SRIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. The method performs better than the state-of-the-art methods in terms of p r e c i s i o n @ k and r e c a l l @ k at the top-k disease prioritization to the subject lincRNAs. We also demonstrate that SRIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. CONCLUSIONS: With the experimental results and computational evaluation, we show that SRIMC is robust in handling datasets with noise and outliers as well as dealing with novel lincRNAs and disease phenotypes. BioMed Central 2017-12-28 /pmc/articles/PMC5751820/ /pubmed/29297358 http://dx.doi.org/10.1186/s12920-017-0310-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Biswas, Ashis Kumer Kim, Dongchul Kang, Mingon Ding, Chris Gao, Jean X. Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases |
title | Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases |
title_full | Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases |
title_fullStr | Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases |
title_full_unstemmed | Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases |
title_short | Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases |
title_sort | stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding rnas to human diseases |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751820/ https://www.ncbi.nlm.nih.gov/pubmed/29297358 http://dx.doi.org/10.1186/s12920-017-0310-1 |
work_keys_str_mv | AT biswasashiskumer stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases AT kimdongchul stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases AT kangmingon stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases AT dingchris stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases AT gaojeanx stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases |