Cargando…

Predicting gene function using hierarchical multi-label decision tree ensembles

BACKGROUND: S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different m...

Descripción completa

Detalles Bibliográficos
Autores principales: Schietgat, Leander, Vens, Celine, Struyf, Jan, Blockeel, Hendrik, Kocev, Dragi, Džeroski, Sašo
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824675/
https://www.ncbi.nlm.nih.gov/pubmed/20044933
http://dx.doi.org/10.1186/1471-2105-11-2
_version_ 1782177715352240128
author Schietgat, Leander
Vens, Celine
Struyf, Jan
Blockeel, Hendrik
Kocev, Dragi
Džeroski, Sašo
author_facet Schietgat, Leander
Vens, Celine
Struyf, Jan
Blockeel, Hendrik
Kocev, Dragi
Džeroski, Sašo
author_sort Schietgat, Leander
collection PubMed
description BACKGROUND: S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. RESULTS: We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. CONCLUSIONS: Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.
format Text
id pubmed-2824675
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28246752010-02-19 Predicting gene function using hierarchical multi-label decision tree ensembles Schietgat, Leander Vens, Celine Struyf, Jan Blockeel, Hendrik Kocev, Dragi Džeroski, Sašo BMC Bioinformatics Research article BACKGROUND: S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. RESULTS: We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. CONCLUSIONS: Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction. BioMed Central 2010-01-02 /pmc/articles/PMC2824675/ /pubmed/20044933 http://dx.doi.org/10.1186/1471-2105-11-2 Text en Copyright ©2010 Schietgat et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Schietgat, Leander
Vens, Celine
Struyf, Jan
Blockeel, Hendrik
Kocev, Dragi
Džeroski, Sašo
Predicting gene function using hierarchical multi-label decision tree ensembles
title Predicting gene function using hierarchical multi-label decision tree ensembles
title_full Predicting gene function using hierarchical multi-label decision tree ensembles
title_fullStr Predicting gene function using hierarchical multi-label decision tree ensembles
title_full_unstemmed Predicting gene function using hierarchical multi-label decision tree ensembles
title_short Predicting gene function using hierarchical multi-label decision tree ensembles
title_sort predicting gene function using hierarchical multi-label decision tree ensembles
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824675/
https://www.ncbi.nlm.nih.gov/pubmed/20044933
http://dx.doi.org/10.1186/1471-2105-11-2
work_keys_str_mv AT schietgatleander predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles
AT vensceline predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles
AT struyfjan predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles
AT blockeelhendrik predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles
AT kocevdragi predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles
AT dzeroskisaso predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles