Cargando…
Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy
The complete set of mouse genes, as with the set of human genes, is still largely uncharacterized, with many pieces of experimental evidence accumulating regarding the activities and expression of the genes, but the majority of genes as yet still of unknown function. Within the context of the MouseF...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447539/ https://www.ncbi.nlm.nih.gov/pubmed/18613949 http://dx.doi.org/10.1186/gb-2008-9-s1-s5 |
_version_ | 1782156962821046272 |
---|---|
author | Kim, Wan Kyu Krumpelman, Chase Marcotte, Edward M |
author_facet | Kim, Wan Kyu Krumpelman, Chase Marcotte, Edward M |
author_sort | Kim, Wan Kyu |
collection | PubMed |
description | The complete set of mouse genes, as with the set of human genes, is still largely uncharacterized, with many pieces of experimental evidence accumulating regarding the activities and expression of the genes, but the majority of genes as yet still of unknown function. Within the context of the MouseFunc competition, we developed and applied two distinct large-scale data mining approaches to infer the functions (Gene Ontology annotations) of mouse genes from experimental observations from available functional genomics, proteomics, comparative genomics, and phenotypic data. The two strategies — the first using classifiers to map features to annotations, the second propagating annotations from characterized genes to uncharacterized genes along edges in a network constructed from the features — offer alternative and possibly complementary approaches to providing functional annotations. Here, we re-implement and evaluate these approaches and their combination for their ability to predict the proper functional annotations of genes in the MouseFunc data set. We show that, when controlling for the same set of input features, the network approach generally outperformed a naïve Bayesian classifier approach, while their combination offers some improvement over either independently. We make our observations of predictive performance on the MouseFunc competition hold-out set, as well as on a ten-fold cross-validation of the MouseFunc data. Across all 1,339 annotated genes in the MouseFunc test set, the median predictive power was quite strong (median area under a receiver operating characteristic plot of 0.865 and average precision of 0.195), indicating that a mining-based strategy with existing data is a promising path towards discovering mammalian gene functions. As one product of this work, a high-confidence subset of the functional mouse gene network was produced — spanning >70% of mouse genes with >1.6 million associations — that is predictive of mouse (and therefore often human) gene function and functional associations. The network should be generally useful for mammalian gene functional analyses, such as for predicting interactions, inferring functional connections between genes and pathways, and prioritizing candidate genes. The network and all predictions are available on the worldwide web. |
format | Text |
id | pubmed-2447539 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-24475392008-07-10 Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy Kim, Wan Kyu Krumpelman, Chase Marcotte, Edward M Genome Biol Method The complete set of mouse genes, as with the set of human genes, is still largely uncharacterized, with many pieces of experimental evidence accumulating regarding the activities and expression of the genes, but the majority of genes as yet still of unknown function. Within the context of the MouseFunc competition, we developed and applied two distinct large-scale data mining approaches to infer the functions (Gene Ontology annotations) of mouse genes from experimental observations from available functional genomics, proteomics, comparative genomics, and phenotypic data. The two strategies — the first using classifiers to map features to annotations, the second propagating annotations from characterized genes to uncharacterized genes along edges in a network constructed from the features — offer alternative and possibly complementary approaches to providing functional annotations. Here, we re-implement and evaluate these approaches and their combination for their ability to predict the proper functional annotations of genes in the MouseFunc data set. We show that, when controlling for the same set of input features, the network approach generally outperformed a naïve Bayesian classifier approach, while their combination offers some improvement over either independently. We make our observations of predictive performance on the MouseFunc competition hold-out set, as well as on a ten-fold cross-validation of the MouseFunc data. Across all 1,339 annotated genes in the MouseFunc test set, the median predictive power was quite strong (median area under a receiver operating characteristic plot of 0.865 and average precision of 0.195), indicating that a mining-based strategy with existing data is a promising path towards discovering mammalian gene functions. As one product of this work, a high-confidence subset of the functional mouse gene network was produced — spanning >70% of mouse genes with >1.6 million associations — that is predictive of mouse (and therefore often human) gene function and functional associations. The network should be generally useful for mammalian gene functional analyses, such as for predicting interactions, inferring functional connections between genes and pathways, and prioritizing candidate genes. The network and all predictions are available on the worldwide web. BioMed Central 2008 2008-06-27 /pmc/articles/PMC2447539/ /pubmed/18613949 http://dx.doi.org/10.1186/gb-2008-9-s1-s5 Text en Copyright © 2008 Kim et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Method Kim, Wan Kyu Krumpelman, Chase Marcotte, Edward M Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy |
title | Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy |
title_full | Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy |
title_fullStr | Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy |
title_full_unstemmed | Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy |
title_short | Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy |
title_sort | inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447539/ https://www.ncbi.nlm.nih.gov/pubmed/18613949 http://dx.doi.org/10.1186/gb-2008-9-s1-s5 |
work_keys_str_mv | AT kimwankyu inferringmousegenefunctionsfromgenomicscaledatausingacombinedfunctionalnetworkclassificationstrategy AT krumpelmanchase inferringmousegenefunctionsfromgenomicscaledatausingacombinedfunctionalnetworkclassificationstrategy AT marcotteedwardm inferringmousegenefunctionsfromgenomicscaledatausingacombinedfunctionalnetworkclassificationstrategy |