Cargando…

Methodology for the inference of gene function from phenotype data

BACKGROUND: Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowl...

Descripción completa

Detalles Bibliográficos
Autores principales: Ascensao, Joao A, Dolan, Mary E, Hill, David P, Blake, Judith A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302099/
https://www.ncbi.nlm.nih.gov/pubmed/25495798
http://dx.doi.org/10.1186/s12859-014-0405-z
_version_ 1782353736853618688
author Ascensao, Joao A
Dolan, Mary E
Hill, David P
Blake, Judith A
author_facet Ascensao, Joao A
Dolan, Mary E
Hill, David P
Blake, Judith A
author_sort Ascensao, Joao A
collection PubMed
description BACKGROUND: Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. RESULTS: We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. CONCLUSIONS: We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0405-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4302099
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43020992015-02-03 Methodology for the inference of gene function from phenotype data Ascensao, Joao A Dolan, Mary E Hill, David P Blake, Judith A BMC Bioinformatics Methodology Article BACKGROUND: Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. RESULTS: We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. CONCLUSIONS: We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0405-z) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-12 /pmc/articles/PMC4302099/ /pubmed/25495798 http://dx.doi.org/10.1186/s12859-014-0405-z Text en © Ascensao et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Ascensao, Joao A
Dolan, Mary E
Hill, David P
Blake, Judith A
Methodology for the inference of gene function from phenotype data
title Methodology for the inference of gene function from phenotype data
title_full Methodology for the inference of gene function from phenotype data
title_fullStr Methodology for the inference of gene function from phenotype data
title_full_unstemmed Methodology for the inference of gene function from phenotype data
title_short Methodology for the inference of gene function from phenotype data
title_sort methodology for the inference of gene function from phenotype data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302099/
https://www.ncbi.nlm.nih.gov/pubmed/25495798
http://dx.doi.org/10.1186/s12859-014-0405-z
work_keys_str_mv AT ascensaojoaoa methodologyfortheinferenceofgenefunctionfromphenotypedata
AT dolanmarye methodologyfortheinferenceofgenefunctionfromphenotypedata
AT hilldavidp methodologyfortheinferenceofgenefunctionfromphenotypedata
AT blakejuditha methodologyfortheinferenceofgenefunctionfromphenotypedata