Cargando…

Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction

BACKGROUND: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms f...

Descripción completa

Detalles Bibliográficos
Autores principales: Stojanova, Daniela, Ceci, Michelangelo, Malerba, Donato, Dzeroski, Saso
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3850549/
https://www.ncbi.nlm.nih.gov/pubmed/24070402
http://dx.doi.org/10.1186/1471-2105-14-285
_version_ 1782294113730691072
author Stojanova, Daniela
Ceci, Michelangelo
Malerba, Donato
Dzeroski, Saso
author_facet Stojanova, Daniela
Ceci, Michelangelo
Malerba, Donato
Dzeroski, Saso
author_sort Stojanova, Daniela
collection PubMed
description BACKGROUND: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. RESULTS: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. CONCLUSIONS: Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.
format Online
Article
Text
id pubmed-3850549
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38505492013-12-16 Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction Stojanova, Daniela Ceci, Michelangelo Malerba, Donato Dzeroski, Saso BMC Bioinformatics Research Article BACKGROUND: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. RESULTS: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. CONCLUSIONS: Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions. BioMed Central 2013-09-26 /pmc/articles/PMC3850549/ /pubmed/24070402 http://dx.doi.org/10.1186/1471-2105-14-285 Text en Copyright © 2013 Stojanova et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Stojanova, Daniela
Ceci, Michelangelo
Malerba, Donato
Dzeroski, Saso
Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction
title Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction
title_full Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction
title_fullStr Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction
title_full_unstemmed Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction
title_short Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction
title_sort using ppi network autocorrelation in hierarchical multi-label classification trees for gene function prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3850549/
https://www.ncbi.nlm.nih.gov/pubmed/24070402
http://dx.doi.org/10.1186/1471-2105-14-285
work_keys_str_mv AT stojanovadaniela usingppinetworkautocorrelationinhierarchicalmultilabelclassificationtreesforgenefunctionprediction
AT cecimichelangelo usingppinetworkautocorrelationinhierarchicalmultilabelclassificationtreesforgenefunctionprediction
AT malerbadonato usingppinetworkautocorrelationinhierarchicalmultilabelclassificationtreesforgenefunctionprediction
AT dzeroskisaso usingppinetworkautocorrelationinhierarchicalmultilabelclassificationtreesforgenefunctionprediction