Cargando…

Gene Prioritization by Compressive Data Fusion and Chaining

Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Žitnik, Marinka, Nam, Edward A., Dinh, Christopher, Kuspa, Adam, Shaulsky, Gad, Zupan, Blaž
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4605714/
https://www.ncbi.nlm.nih.gov/pubmed/26465776
http://dx.doi.org/10.1371/journal.pcbi.1004552
_version_ 1782395246422786048
author Žitnik, Marinka
Nam, Edward A.
Dinh, Christopher
Kuspa, Adam
Shaulsky, Gad
Zupan, Blaž
author_facet Žitnik, Marinka
Nam, Edward A.
Dinh, Christopher
Kuspa, Adam
Shaulsky, Gad
Zupan, Blaž
author_sort Žitnik, Marinka
collection PubMed
description Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets.
format Online
Article
Text
id pubmed-4605714
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46057142015-10-29 Gene Prioritization by Compressive Data Fusion and Chaining Žitnik, Marinka Nam, Edward A. Dinh, Christopher Kuspa, Adam Shaulsky, Gad Zupan, Blaž PLoS Comput Biol Research Article Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets. Public Library of Science 2015-10-14 /pmc/articles/PMC4605714/ /pubmed/26465776 http://dx.doi.org/10.1371/journal.pcbi.1004552 Text en © 2015 Žitnik et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Žitnik, Marinka
Nam, Edward A.
Dinh, Christopher
Kuspa, Adam
Shaulsky, Gad
Zupan, Blaž
Gene Prioritization by Compressive Data Fusion and Chaining
title Gene Prioritization by Compressive Data Fusion and Chaining
title_full Gene Prioritization by Compressive Data Fusion and Chaining
title_fullStr Gene Prioritization by Compressive Data Fusion and Chaining
title_full_unstemmed Gene Prioritization by Compressive Data Fusion and Chaining
title_short Gene Prioritization by Compressive Data Fusion and Chaining
title_sort gene prioritization by compressive data fusion and chaining
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4605714/
https://www.ncbi.nlm.nih.gov/pubmed/26465776
http://dx.doi.org/10.1371/journal.pcbi.1004552
work_keys_str_mv AT zitnikmarinka geneprioritizationbycompressivedatafusionandchaining
AT namedwarda geneprioritizationbycompressivedatafusionandchaining
AT dinhchristopher geneprioritizationbycompressivedatafusionandchaining
AT kuspaadam geneprioritizationbycompressivedatafusionandchaining
AT shaulskygad geneprioritizationbycompressivedatafusionandchaining
AT zupanblaz geneprioritizationbycompressivedatafusionandchaining