Cargando…
Predicting gene function using hierarchical multi-label decision tree ensembles
BACKGROUND: S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different m...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824675/ https://www.ncbi.nlm.nih.gov/pubmed/20044933 http://dx.doi.org/10.1186/1471-2105-11-2 |
_version_ | 1782177715352240128 |
---|---|
author | Schietgat, Leander Vens, Celine Struyf, Jan Blockeel, Hendrik Kocev, Dragi Džeroski, Sašo |
author_facet | Schietgat, Leander Vens, Celine Struyf, Jan Blockeel, Hendrik Kocev, Dragi Džeroski, Sašo |
author_sort | Schietgat, Leander |
collection | PubMed |
description | BACKGROUND: S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. RESULTS: We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. CONCLUSIONS: Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction. |
format | Text |
id | pubmed-2824675 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-28246752010-02-19 Predicting gene function using hierarchical multi-label decision tree ensembles Schietgat, Leander Vens, Celine Struyf, Jan Blockeel, Hendrik Kocev, Dragi Džeroski, Sašo BMC Bioinformatics Research article BACKGROUND: S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. RESULTS: We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. CONCLUSIONS: Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction. BioMed Central 2010-01-02 /pmc/articles/PMC2824675/ /pubmed/20044933 http://dx.doi.org/10.1186/1471-2105-11-2 Text en Copyright ©2010 Schietgat et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research article Schietgat, Leander Vens, Celine Struyf, Jan Blockeel, Hendrik Kocev, Dragi Džeroski, Sašo Predicting gene function using hierarchical multi-label decision tree ensembles |
title | Predicting gene function using hierarchical multi-label decision tree ensembles |
title_full | Predicting gene function using hierarchical multi-label decision tree ensembles |
title_fullStr | Predicting gene function using hierarchical multi-label decision tree ensembles |
title_full_unstemmed | Predicting gene function using hierarchical multi-label decision tree ensembles |
title_short | Predicting gene function using hierarchical multi-label decision tree ensembles |
title_sort | predicting gene function using hierarchical multi-label decision tree ensembles |
topic | Research article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824675/ https://www.ncbi.nlm.nih.gov/pubmed/20044933 http://dx.doi.org/10.1186/1471-2105-11-2 |
work_keys_str_mv | AT schietgatleander predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles AT vensceline predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles AT struyfjan predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles AT blockeelhendrik predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles AT kocevdragi predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles AT dzeroskisaso predictinggenefunctionusinghierarchicalmultilabeldecisiontreeensembles |