Cargando…

Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases

BACKGROUNDS: A large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However...

Descripción completa

Detalles Bibliográficos
Autores principales: Biswas, Ashis Kumer, Kim, Dongchul, Kang, Mingon, Ding, Chris, Gao, Jean X.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751820/
https://www.ncbi.nlm.nih.gov/pubmed/29297358
http://dx.doi.org/10.1186/s12920-017-0310-1
_version_ 1783290026260430848
author Biswas, Ashis Kumer
Kim, Dongchul
Kang, Mingon
Ding, Chris
Gao, Jean X.
author_facet Biswas, Ashis Kumer
Kim, Dongchul
Kang, Mingon
Ding, Chris
Gao, Jean X.
author_sort Biswas, Ashis Kumer
collection PubMed
description BACKGROUNDS: A large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However, a plethora lincRNA-data are available now, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc, which opens the opportunity for cutting-edge machine learning and data mining approaches to extract meaningful relationships among lincRNAs and diseases. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of them utilizes side information of both the entities simultaneously in a single framework. METHODS: The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information about them. However, the formulation of IMC is incapable of handling noise and outliers that may be present in the datasets, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve the two issues. As a remedy, in this paper, we propose Stable Robust Inductive Matrix Completion (SRIMC) that utilizes the l (2,1) norm based regularization to optimize the objective function with a unique 2-step stable solution approach. RESULTS: We applied SRIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. The method performs better than the state-of-the-art methods in terms of p r e c i s i o n @ k and r e c a l l @ k at the top-k disease prioritization to the subject lincRNAs. We also demonstrate that SRIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. CONCLUSIONS: With the experimental results and computational evaluation, we show that SRIMC is robust in handling datasets with noise and outliers as well as dealing with novel lincRNAs and disease phenotypes.
format Online
Article
Text
id pubmed-5751820
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57518202018-01-05 Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases Biswas, Ashis Kumer Kim, Dongchul Kang, Mingon Ding, Chris Gao, Jean X. BMC Med Genomics Research BACKGROUNDS: A large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However, a plethora lincRNA-data are available now, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc, which opens the opportunity for cutting-edge machine learning and data mining approaches to extract meaningful relationships among lincRNAs and diseases. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of them utilizes side information of both the entities simultaneously in a single framework. METHODS: The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information about them. However, the formulation of IMC is incapable of handling noise and outliers that may be present in the datasets, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve the two issues. As a remedy, in this paper, we propose Stable Robust Inductive Matrix Completion (SRIMC) that utilizes the l (2,1) norm based regularization to optimize the objective function with a unique 2-step stable solution approach. RESULTS: We applied SRIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. The method performs better than the state-of-the-art methods in terms of p r e c i s i o n @ k and r e c a l l @ k at the top-k disease prioritization to the subject lincRNAs. We also demonstrate that SRIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. CONCLUSIONS: With the experimental results and computational evaluation, we show that SRIMC is robust in handling datasets with noise and outliers as well as dealing with novel lincRNAs and disease phenotypes. BioMed Central 2017-12-28 /pmc/articles/PMC5751820/ /pubmed/29297358 http://dx.doi.org/10.1186/s12920-017-0310-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Biswas, Ashis Kumer
Kim, Dongchul
Kang, Mingon
Ding, Chris
Gao, Jean X.
Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases
title Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases
title_full Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases
title_fullStr Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases
title_full_unstemmed Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases
title_short Stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases
title_sort stable solution to l(2,1)-based robust inductive matrix completion and its application in linking long noncoding rnas to human diseases
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751820/
https://www.ncbi.nlm.nih.gov/pubmed/29297358
http://dx.doi.org/10.1186/s12920-017-0310-1
work_keys_str_mv AT biswasashiskumer stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases
AT kimdongchul stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases
AT kangmingon stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases
AT dingchris stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases
AT gaojeanx stablesolutiontol21basedrobustinductivematrixcompletionanditsapplicationinlinkinglongnoncodingrnastohumandiseases