Cargando…
Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration
BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of gen...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1481625/ https://www.ncbi.nlm.nih.gov/pubmed/16725034 http://dx.doi.org/10.1186/1471-2105-7-268 |
_version_ | 1782128279450288128 |
---|---|
author | Xiong, Jianghui Rayner, Simon Luo, Kunyi Li, Yinghui Chen, Shanguang |
author_facet | Xiong, Jianghui Rayner, Simon Luo, Kunyi Li, Yinghui Chen, Shanguang |
author_sort | Xiong, Jianghui |
collection | PubMed |
description | BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. RESULTS: We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. CONCLUSION: This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions. |
format | Text |
id | pubmed-1481625 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-14816252006-06-22 Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration Xiong, Jianghui Rayner, Simon Luo, Kunyi Li, Yinghui Chen, Shanguang BMC Bioinformatics Research Article BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. RESULTS: We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. CONCLUSION: This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions. BioMed Central 2006-05-25 /pmc/articles/PMC1481625/ /pubmed/16725034 http://dx.doi.org/10.1186/1471-2105-7-268 Text en Copyright © 2006 Xiong et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Xiong, Jianghui Rayner, Simon Luo, Kunyi Li, Yinghui Chen, Shanguang Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration |
title | Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration |
title_full | Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration |
title_fullStr | Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration |
title_full_unstemmed | Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration |
title_short | Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration |
title_sort | genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1481625/ https://www.ncbi.nlm.nih.gov/pubmed/16725034 http://dx.doi.org/10.1186/1471-2105-7-268 |
work_keys_str_mv | AT xiongjianghui genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration AT raynersimon genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration AT luokunyi genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration AT liyinghui genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration AT chenshanguang genomewidepredictionofproteinfunctionviaagenericknowledgediscoveryapproachbasedonevidenceintegration |