Cargando…

Gene expression prediction using low-rank matrix completion

BACKGROUND: An exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and...

Descripción completa

Detalles Bibliográficos
Autores principales: Kapur, Arnav, Marwah, Kshitij, Alterovitz, Gil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4912738/
https://www.ncbi.nlm.nih.gov/pubmed/27317252
http://dx.doi.org/10.1186/s12859-016-1106-6
_version_ 1782438315046207488
author Kapur, Arnav
Marwah, Kshitij
Alterovitz, Gil
author_facet Kapur, Arnav
Marwah, Kshitij
Alterovitz, Gil
author_sort Kapur, Arnav
collection PubMed
description BACKGROUND: An exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and prognostic biomarkers. Although data storage costs have reduced, process of capturing data using aforementioned technologies is still expensive. Moreover, the time required for the assay, from sample preparation to raw value measurement is excessive (in the order of days). There is an opportunity to reduce both the cost and time for generating such expression datasets. RESULTS: We propose a framework in which complete gene expression values can be reliably predicted in-silico from partial measurements. This is achieved by modelling expression data as a low-rank matrix and then applying recently discovered techniques of matrix completion by using nonlinear convex optimisation. We evaluated prediction of gene expression data based on 133 studies, sourced from a combined total of 10,921 samples. It is shown that such datasets can be constructed with a low relative error even at high missing value rates (>50 %), and that such predicted datasets can be reliably used as surrogates for further analysis. CONCLUSION: This method has potentially far-reaching applications including how bio-medical data is sourced and generated, and transcriptomic prediction by optimisation. We show that gene expression data can be computationally constructed, thereby potentially reducing the costs of gene expression profiling. In conclusion, this method shows great promise of opening new avenues in research on low-rank matrix completion in biological sciences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1106-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4912738
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49127382016-06-20 Gene expression prediction using low-rank matrix completion Kapur, Arnav Marwah, Kshitij Alterovitz, Gil BMC Bioinformatics Research Article BACKGROUND: An exponential growth of high-throughput biological information and data has occurred in the past decade, supported by technologies, such as microarrays and RNA-Seq. Most data generated using such methods are used to encode large amounts of rich information, and determine diagnostic and prognostic biomarkers. Although data storage costs have reduced, process of capturing data using aforementioned technologies is still expensive. Moreover, the time required for the assay, from sample preparation to raw value measurement is excessive (in the order of days). There is an opportunity to reduce both the cost and time for generating such expression datasets. RESULTS: We propose a framework in which complete gene expression values can be reliably predicted in-silico from partial measurements. This is achieved by modelling expression data as a low-rank matrix and then applying recently discovered techniques of matrix completion by using nonlinear convex optimisation. We evaluated prediction of gene expression data based on 133 studies, sourced from a combined total of 10,921 samples. It is shown that such datasets can be constructed with a low relative error even at high missing value rates (>50 %), and that such predicted datasets can be reliably used as surrogates for further analysis. CONCLUSION: This method has potentially far-reaching applications including how bio-medical data is sourced and generated, and transcriptomic prediction by optimisation. We show that gene expression data can be computationally constructed, thereby potentially reducing the costs of gene expression profiling. In conclusion, this method shows great promise of opening new avenues in research on low-rank matrix completion in biological sciences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1106-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-17 /pmc/articles/PMC4912738/ /pubmed/27317252 http://dx.doi.org/10.1186/s12859-016-1106-6 Text en © Kapur et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Kapur, Arnav
Marwah, Kshitij
Alterovitz, Gil
Gene expression prediction using low-rank matrix completion
title Gene expression prediction using low-rank matrix completion
title_full Gene expression prediction using low-rank matrix completion
title_fullStr Gene expression prediction using low-rank matrix completion
title_full_unstemmed Gene expression prediction using low-rank matrix completion
title_short Gene expression prediction using low-rank matrix completion
title_sort gene expression prediction using low-rank matrix completion
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4912738/
https://www.ncbi.nlm.nih.gov/pubmed/27317252
http://dx.doi.org/10.1186/s12859-016-1106-6
work_keys_str_mv AT kapurarnav geneexpressionpredictionusinglowrankmatrixcompletion
AT marwahkshitij geneexpressionpredictionusinglowrankmatrixcompletion
AT alterovitzgil geneexpressionpredictionusinglowrankmatrixcompletion