Cargando…

Mining phenotypes for gene function prediction

BACKGROUND: Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interfer...

Descripción completa

Detalles Bibliográficos
Autores principales: Groth, Philip, Weiss, Bertram, Pohlenz, Hans-Dieter, Leser, Ulf
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2311305/
https://www.ncbi.nlm.nih.gov/pubmed/18315868
http://dx.doi.org/10.1186/1471-2105-9-136
_version_ 1782152559292579840
author Groth, Philip
Weiss, Bertram
Pohlenz, Hans-Dieter
Leser, Ulf
author_facet Groth, Philip
Weiss, Bertram
Pohlenz, Hans-Dieter
Leser, Ulf
author_sort Groth, Philip
collection PubMed
description BACKGROUND: Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interference or gene knock-out, have been developed and used to decipher functions for genes. However, there have been relatively few efforts to make use of phenotype data beyond the single genotype-phenotype relationships. RESULTS: We present results on a study where we use a large set of phenotype data – in textual form – to predict gene annotation. To this end, we use text clustering to group genes based on their phenotype descriptions. We show that these clusters correlate well with several indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. We exploit these clusters for predicting gene function by carrying over annotations from well-annotated genes to other, less-characterized genes in the same cluster. For a subset of groups selected by applying objective criteria, we can predict GO-term annotations from the biological process sub-ontology with up to 72.6% precision and 16.7% recall, as evaluated by cross-validation. We manually verified some of these clusters and found them to exhibit high biological coherence, e.g. a group containing all available antennal Drosophila odorant receptors despite inconsistent GO-annotations. CONCLUSION: The intrinsic nature of phenotypes to visibly reflect genetic activity underlines their usefulness in inferring new gene functions. Thus, systematically analyzing these data on a large scale offers many possibilities for inferring functional annotation of genes. We show that text clustering can play an important role in this process.
format Text
id pubmed-2311305
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23113052008-04-16 Mining phenotypes for gene function prediction Groth, Philip Weiss, Bertram Pohlenz, Hans-Dieter Leser, Ulf BMC Bioinformatics Research Article BACKGROUND: Health and disease of organisms are reflected in their phenotypes. Often, a genetic component to a disease is discovered only after clearly defining its phenotype. In the past years, many technologies to systematically generate phenotypes in a high-throughput manner, such as RNA interference or gene knock-out, have been developed and used to decipher functions for genes. However, there have been relatively few efforts to make use of phenotype data beyond the single genotype-phenotype relationships. RESULTS: We present results on a study where we use a large set of phenotype data – in textual form – to predict gene annotation. To this end, we use text clustering to group genes based on their phenotype descriptions. We show that these clusters correlate well with several indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. We exploit these clusters for predicting gene function by carrying over annotations from well-annotated genes to other, less-characterized genes in the same cluster. For a subset of groups selected by applying objective criteria, we can predict GO-term annotations from the biological process sub-ontology with up to 72.6% precision and 16.7% recall, as evaluated by cross-validation. We manually verified some of these clusters and found them to exhibit high biological coherence, e.g. a group containing all available antennal Drosophila odorant receptors despite inconsistent GO-annotations. CONCLUSION: The intrinsic nature of phenotypes to visibly reflect genetic activity underlines their usefulness in inferring new gene functions. Thus, systematically analyzing these data on a large scale offers many possibilities for inferring functional annotation of genes. We show that text clustering can play an important role in this process. BioMed Central 2008-03-03 /pmc/articles/PMC2311305/ /pubmed/18315868 http://dx.doi.org/10.1186/1471-2105-9-136 Text en Copyright © 2008 Groth et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Groth, Philip
Weiss, Bertram
Pohlenz, Hans-Dieter
Leser, Ulf
Mining phenotypes for gene function prediction
title Mining phenotypes for gene function prediction
title_full Mining phenotypes for gene function prediction
title_fullStr Mining phenotypes for gene function prediction
title_full_unstemmed Mining phenotypes for gene function prediction
title_short Mining phenotypes for gene function prediction
title_sort mining phenotypes for gene function prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2311305/
https://www.ncbi.nlm.nih.gov/pubmed/18315868
http://dx.doi.org/10.1186/1471-2105-9-136
work_keys_str_mv AT grothphilip miningphenotypesforgenefunctionprediction
AT weissbertram miningphenotypesforgenefunctionprediction
AT pohlenzhansdieter miningphenotypesforgenefunctionprediction
AT leserulf miningphenotypesforgenefunctionprediction