Cargando…

Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins

Motivation: RNA binding proteins (RBPs) play important roles in post-transcriptional control of gene expression, including splicing, transport, polyadenylation and RNA stability. To model protein–RNA interactions by considering all available sources of information, it is necessary to integrate the r...

Descripción completa

Detalles Bibliográficos
Autores principales: Stražar, Martin, Žitnik, Marinka, Zupan, Blaž, Ule, Jernej, Curk, Tomaž
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4894278/
https://www.ncbi.nlm.nih.gov/pubmed/26787667
http://dx.doi.org/10.1093/bioinformatics/btw003
_version_ 1782435655292289024
author Stražar, Martin
Žitnik, Marinka
Zupan, Blaž
Ule, Jernej
Curk, Tomaž
author_facet Stražar, Martin
Žitnik, Marinka
Zupan, Blaž
Ule, Jernej
Curk, Tomaž
author_sort Stražar, Martin
collection PubMed
description Motivation: RNA binding proteins (RBPs) play important roles in post-transcriptional control of gene expression, including splicing, transport, polyadenylation and RNA stability. To model protein–RNA interactions by considering all available sources of information, it is necessary to integrate the rapidly growing RBP experimental data with the latest genome annotation, gene function, RNA sequence and structure. Such integration is possible by matrix factorization, where current approaches have an undesired tendency to identify only a small number of the strongest patterns with overlapping features. Because protein–RNA interactions are orchestrated by multiple factors, methods that identify discriminative patterns of varying strengths are needed. Results: We have developed an integrative orthogonality-regularized nonnegative matrix factorization (iONMF) to integrate multiple data sources and discover non-overlapping, class-specific RNA binding patterns of varying strengths. The orthogonality constraint halves the effective size of the factor model and outperforms other NMF models in predicting RBP interaction sites on RNA. We have integrated the largest data compendium to date, which includes 31 CLIP experiments on 19 RBPs involved in splicing (such as hnRNPs, U2AF2, ELAVL1, TDP-43 and FUS) and processing of 3’UTR (Ago, IGF2BP). We show that the integration of multiple data sources improves the predictive accuracy of retrieval of RNA binding sites. In our study the key predictive factors of protein–RNA interactions were the position of RNA structure and sequence motifs, RBP co-binding and gene region type. We report on a number of protein-specific patterns, many of which are consistent with experimentally determined properties of RBPs. Availability and implementation: The iONMF implementation and example datasets are available at https://github.com/mstrazar/ionmf. Contact: tomaz.curk@fri.uni-lj.si Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4894278
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48942782016-06-07 Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins Stražar, Martin Žitnik, Marinka Zupan, Blaž Ule, Jernej Curk, Tomaž Bioinformatics Original Papers Motivation: RNA binding proteins (RBPs) play important roles in post-transcriptional control of gene expression, including splicing, transport, polyadenylation and RNA stability. To model protein–RNA interactions by considering all available sources of information, it is necessary to integrate the rapidly growing RBP experimental data with the latest genome annotation, gene function, RNA sequence and structure. Such integration is possible by matrix factorization, where current approaches have an undesired tendency to identify only a small number of the strongest patterns with overlapping features. Because protein–RNA interactions are orchestrated by multiple factors, methods that identify discriminative patterns of varying strengths are needed. Results: We have developed an integrative orthogonality-regularized nonnegative matrix factorization (iONMF) to integrate multiple data sources and discover non-overlapping, class-specific RNA binding patterns of varying strengths. The orthogonality constraint halves the effective size of the factor model and outperforms other NMF models in predicting RBP interaction sites on RNA. We have integrated the largest data compendium to date, which includes 31 CLIP experiments on 19 RBPs involved in splicing (such as hnRNPs, U2AF2, ELAVL1, TDP-43 and FUS) and processing of 3’UTR (Ago, IGF2BP). We show that the integration of multiple data sources improves the predictive accuracy of retrieval of RNA binding sites. In our study the key predictive factors of protein–RNA interactions were the position of RNA structure and sequence motifs, RBP co-binding and gene region type. We report on a number of protein-specific patterns, many of which are consistent with experimentally determined properties of RBPs. Availability and implementation: The iONMF implementation and example datasets are available at https://github.com/mstrazar/ionmf. Contact: tomaz.curk@fri.uni-lj.si Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-05-15 2016-01-18 /pmc/articles/PMC4894278/ /pubmed/26787667 http://dx.doi.org/10.1093/bioinformatics/btw003 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Stražar, Martin
Žitnik, Marinka
Zupan, Blaž
Ule, Jernej
Curk, Tomaž
Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins
title Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins
title_full Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins
title_fullStr Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins
title_full_unstemmed Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins
title_short Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins
title_sort orthogonal matrix factorization enables integrative analysis of multiple rna binding proteins
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4894278/
https://www.ncbi.nlm.nih.gov/pubmed/26787667
http://dx.doi.org/10.1093/bioinformatics/btw003
work_keys_str_mv AT strazarmartin orthogonalmatrixfactorizationenablesintegrativeanalysisofmultiplernabindingproteins
AT zitnikmarinka orthogonalmatrixfactorizationenablesintegrativeanalysisofmultiplernabindingproteins
AT zupanblaz orthogonalmatrixfactorizationenablesintegrativeanalysisofmultiplernabindingproteins
AT ulejernej orthogonalmatrixfactorizationenablesintegrativeanalysisofmultiplernabindingproteins
AT curktomaz orthogonalmatrixfactorizationenablesintegrativeanalysisofmultiplernabindingproteins