Cargando…

Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods

BACKGROUND: The prediction of human gene–abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides...

Descripción completa

Detalles Bibliográficos
Autores principales: Notaro, Marco, Schubach, Max, Robinson, Peter N., Valentini, Giorgio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5639780/
https://www.ncbi.nlm.nih.gov/pubmed/29025394
http://dx.doi.org/10.1186/s12859-017-1854-y
_version_ 1783270944821739520
author Notaro, Marco
Schubach, Max
Robinson, Peter N.
Valentini, Giorgio
author_facet Notaro, Marco
Schubach, Max
Robinson, Peter N.
Valentini, Giorgio
author_sort Notaro, Marco
collection PubMed
description BACKGROUND: The prediction of human gene–abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene–disease associations has been widely investigated, the related problem of gene–phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. RESULTS: We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a “flat” learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. CONCLUSIONS: Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1854-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5639780
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56397802017-10-18 Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods Notaro, Marco Schubach, Max Robinson, Peter N. Valentini, Giorgio BMC Bioinformatics Research Article BACKGROUND: The prediction of human gene–abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene–disease associations has been widely investigated, the related problem of gene–phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. RESULTS: We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a “flat” learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. CONCLUSIONS: Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1854-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-10-12 /pmc/articles/PMC5639780/ /pubmed/29025394 http://dx.doi.org/10.1186/s12859-017-1854-y Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Notaro, Marco
Schubach, Max
Robinson, Peter N.
Valentini, Giorgio
Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
title Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
title_full Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
title_fullStr Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
title_full_unstemmed Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
title_short Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
title_sort prediction of human phenotype ontology terms by means of hierarchical ensemble methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5639780/
https://www.ncbi.nlm.nih.gov/pubmed/29025394
http://dx.doi.org/10.1186/s12859-017-1854-y
work_keys_str_mv AT notaromarco predictionofhumanphenotypeontologytermsbymeansofhierarchicalensemblemethods
AT schubachmax predictionofhumanphenotypeontologytermsbymeansofhierarchicalensemblemethods
AT robinsonpetern predictionofhumanphenotypeontologytermsbymeansofhierarchicalensemblemethods
AT valentinigiorgio predictionofhumanphenotypeontologytermsbymeansofhierarchicalensemblemethods