Cargando…

Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration

BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiong, Jianghui, Rayner, Simon, Luo, Kunyi, Li, Yinghui, Chen, Shanguang
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1481625/
https://www.ncbi.nlm.nih.gov/pubmed/16725034
http://dx.doi.org/10.1186/1471-2105-7-268
_version_ 1782128279450288128
author Xiong, Jianghui
Rayner, Simon
Luo, Kunyi
Li, Yinghui
Chen, Shanguang
author_facet Xiong, Jianghui
Rayner, Simon
Luo, Kunyi
Li, Yinghui
Chen, Shanguang
author_sort Xiong, Jianghui
collection PubMed
description BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. RESULTS: We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. CONCLUSION: This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions.
format Text
id pubmed-1481625
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-14816252006-06-22 Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration Xiong, Jianghui Rayner, Simon Luo, Kunyi Li, Yinghui Chen, Shanguang BMC Bioinformatics Research Article BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. RESULTS: We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. CONCLUSION: This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions. BioMed Central 2006-05-25 /pmc/articles/PMC1481625/ /pubmed/16725034 http://dx.doi.org/10.1186/1471-2105-7-268 Text en Copyright © 2006 Xiong et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Xiong, Jianghui
Rayner, Simon
Luo, Kunyi
Li, Yinghui
Chen, Shanguang
Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration
title Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration
title_full Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration
title_fullStr Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration
title_full_unstemmed Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration
title_short Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration
title_sort genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1481625/
https://www.ncbi.nlm.nih.gov/pubmed/16725034
http://dx.doi.org/10.1186/1471-2105-7-268
work_keys_str_mv AT xiongjianghui genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration
AT raynersimon genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration
AT luokunyi genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration
AT liyinghui genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration
AT chenshanguang genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration