Cargando…
Methodology for the inference of gene function from phenotype data
BACKGROUND: Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowl...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302099/ https://www.ncbi.nlm.nih.gov/pubmed/25495798 http://dx.doi.org/10.1186/s12859-014-0405-z |
_version_ | 1782353736853618688 |
---|---|
author | Ascensao, Joao A Dolan, Mary E Hill, David P Blake, Judith A |
author_facet | Ascensao, Joao A Dolan, Mary E Hill, David P Blake, Judith A |
author_sort | Ascensao, Joao A |
collection | PubMed |
description | BACKGROUND: Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. RESULTS: We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. CONCLUSIONS: We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0405-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4302099 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43020992015-02-03 Methodology for the inference of gene function from phenotype data Ascensao, Joao A Dolan, Mary E Hill, David P Blake, Judith A BMC Bioinformatics Methodology Article BACKGROUND: Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. RESULTS: We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. CONCLUSIONS: We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0405-z) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-12 /pmc/articles/PMC4302099/ /pubmed/25495798 http://dx.doi.org/10.1186/s12859-014-0405-z Text en © Ascensao et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Ascensao, Joao A Dolan, Mary E Hill, David P Blake, Judith A Methodology for the inference of gene function from phenotype data |
title | Methodology for the inference of gene function from phenotype data |
title_full | Methodology for the inference of gene function from phenotype data |
title_fullStr | Methodology for the inference of gene function from phenotype data |
title_full_unstemmed | Methodology for the inference of gene function from phenotype data |
title_short | Methodology for the inference of gene function from phenotype data |
title_sort | methodology for the inference of gene function from phenotype data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302099/ https://www.ncbi.nlm.nih.gov/pubmed/25495798 http://dx.doi.org/10.1186/s12859-014-0405-z |
work_keys_str_mv | AT ascensaojoaoa methodologyfortheinferenceofgenefunctionfromphenotypedata AT dolanmarye methodologyfortheinferenceofgenefunctionfromphenotypedata AT hilldavidp methodologyfortheinferenceofgenefunctionfromphenotypedata AT blakejuditha methodologyfortheinferenceofgenefunctionfromphenotypedata |