Cargando…

Metric learning on expression data for gene function prediction

MOTIVATION: Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expected to be relevant f...

Descripción completa

Detalles Bibliográficos
Autores principales: Makrodimitris, Stavros, Reinders, Marcel J T, van Ham, Roeland C H J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703756/
https://www.ncbi.nlm.nih.gov/pubmed/31562759
http://dx.doi.org/10.1093/bioinformatics/btz731
_version_ 1783616689079844864
author Makrodimitris, Stavros
Reinders, Marcel J T
van Ham, Roeland C H J
author_facet Makrodimitris, Stavros
Reinders, Marcel J T
van Ham, Roeland C H J
author_sort Makrodimitris, Stavros
collection PubMed
description MOTIVATION: Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expected to be relevant for finding genes related to a particular Gene Ontology (GO) term. Therefore, we hypothesize that when the purpose is to find similarly functioning genes, the co-expression of genes should not be determined on all samples but only on those samples informative for the GO term of interest. RESULTS: To address this, we developed Metric Learning for Co-expression (MLC), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions. More specifically, if two genes are annotated with a given GO term, MLC tries to maximize their weighted co-expression and, in addition, if one of them is not annotated with that term, the weighted co-expression is minimized. Our experiments on publicly available Arabidopsis thaliana RNA-Seq data demonstrate that MLC outperforms standard Pearson correlation in term-centric performance. Moreover, our method is particularly good at more specific terms, which are the most interesting. Finally, by observing the sample weights for a particular GO term, one can identify which experiments are important for learning that term and potentially identify novel conditions that are relevant, as demonstrated by experiments in both A. thaliana and Pseudomonas Aeruginosa. AVAILABILITY AND IMPLEMENTATION: MLC is available as a Python package at www.github.com/stamakro/MLC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7703756
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77037562020-12-07 Metric learning on expression data for gene function prediction Makrodimitris, Stavros Reinders, Marcel J T van Ham, Roeland C H J Bioinformatics Original Papers MOTIVATION: Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expected to be relevant for finding genes related to a particular Gene Ontology (GO) term. Therefore, we hypothesize that when the purpose is to find similarly functioning genes, the co-expression of genes should not be determined on all samples but only on those samples informative for the GO term of interest. RESULTS: To address this, we developed Metric Learning for Co-expression (MLC), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions. More specifically, if two genes are annotated with a given GO term, MLC tries to maximize their weighted co-expression and, in addition, if one of them is not annotated with that term, the weighted co-expression is minimized. Our experiments on publicly available Arabidopsis thaliana RNA-Seq data demonstrate that MLC outperforms standard Pearson correlation in term-centric performance. Moreover, our method is particularly good at more specific terms, which are the most interesting. Finally, by observing the sample weights for a particular GO term, one can identify which experiments are important for learning that term and potentially identify novel conditions that are relevant, as demonstrated by experiments in both A. thaliana and Pseudomonas Aeruginosa. AVAILABILITY AND IMPLEMENTATION: MLC is available as a Python package at www.github.com/stamakro/MLC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-02-15 2019-09-28 /pmc/articles/PMC7703756/ /pubmed/31562759 http://dx.doi.org/10.1093/bioinformatics/btz731 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Makrodimitris, Stavros
Reinders, Marcel J T
van Ham, Roeland C H J
Metric learning on expression data for gene function prediction
title Metric learning on expression data for gene function prediction
title_full Metric learning on expression data for gene function prediction
title_fullStr Metric learning on expression data for gene function prediction
title_full_unstemmed Metric learning on expression data for gene function prediction
title_short Metric learning on expression data for gene function prediction
title_sort metric learning on expression data for gene function prediction
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703756/
https://www.ncbi.nlm.nih.gov/pubmed/31562759
http://dx.doi.org/10.1093/bioinformatics/btz731
work_keys_str_mv AT makrodimitrisstavros metriclearningonexpressiondataforgenefunctionprediction
AT reindersmarceljt metriclearningonexpressiondataforgenefunctionprediction
AT vanhamroelandchj metriclearningonexpressiondataforgenefunctionprediction