Cargando…
Gene Prioritization by Compressive Data Fusion and Chaining
Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the pr...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4605714/ https://www.ncbi.nlm.nih.gov/pubmed/26465776 http://dx.doi.org/10.1371/journal.pcbi.1004552 |
_version_ | 1782395246422786048 |
---|---|
author | Žitnik, Marinka Nam, Edward A. Dinh, Christopher Kuspa, Adam Shaulsky, Gad Zupan, Blaž |
author_facet | Žitnik, Marinka Nam, Edward A. Dinh, Christopher Kuspa, Adam Shaulsky, Gad Zupan, Blaž |
author_sort | Žitnik, Marinka |
collection | PubMed |
description | Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets. |
format | Online Article Text |
id | pubmed-4605714 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-46057142015-10-29 Gene Prioritization by Compressive Data Fusion and Chaining Žitnik, Marinka Nam, Edward A. Dinh, Christopher Kuspa, Adam Shaulsky, Gad Zupan, Blaž PLoS Comput Biol Research Article Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets. Public Library of Science 2015-10-14 /pmc/articles/PMC4605714/ /pubmed/26465776 http://dx.doi.org/10.1371/journal.pcbi.1004552 Text en © 2015 Žitnik et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Žitnik, Marinka Nam, Edward A. Dinh, Christopher Kuspa, Adam Shaulsky, Gad Zupan, Blaž Gene Prioritization by Compressive Data Fusion and Chaining |
title | Gene Prioritization by Compressive Data Fusion and Chaining |
title_full | Gene Prioritization by Compressive Data Fusion and Chaining |
title_fullStr | Gene Prioritization by Compressive Data Fusion and Chaining |
title_full_unstemmed | Gene Prioritization by Compressive Data Fusion and Chaining |
title_short | Gene Prioritization by Compressive Data Fusion and Chaining |
title_sort | gene prioritization by compressive data fusion and chaining |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4605714/ https://www.ncbi.nlm.nih.gov/pubmed/26465776 http://dx.doi.org/10.1371/journal.pcbi.1004552 |
work_keys_str_mv | AT zitnikmarinka geneprioritizationbycompressivedatafusionandchaining AT namedwarda geneprioritizationbycompressivedatafusionandchaining AT dinhchristopher geneprioritizationbycompressivedatafusionandchaining AT kuspaadam geneprioritizationbycompressivedatafusionandchaining AT shaulskygad geneprioritizationbycompressivedatafusionandchaining AT zupanblaz geneprioritizationbycompressivedatafusionandchaining |