Cargando…

An FPT Approach for Predicting Protein Localization from Yeast Genomic Data

Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coi...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jin, Li, Chunhe, Wang, Erkang, Wang, Xidi
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3023707/
https://www.ncbi.nlm.nih.gov/pubmed/21283516
http://dx.doi.org/10.1371/journal.pone.0014449
_version_ 1782196684072157184
author Wang, Jin
Li, Chunhe
Wang, Erkang
Wang, Xidi
author_facet Wang, Jin
Li, Chunhe
Wang, Erkang
Wang, Xidi
author_sort Wang, Jin
collection PubMed
description Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coincidentally, data mining methods been developed and refined in order to handle this experimental windfall, thus allowing the scientific community to quantitatively address long-standing questions such as that of protein localization. Here, we develop a frequent pattern tree (FPT) approach to generate a minimum set of rules (mFPT) for predicting protein localization. We acquire a series of rules according to the features of yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regression under various statistical measures. Our results show that mFPT gave better performance than other approaches in predicting protein localization. Meanwhile, setting 0.65 as the minimum hit-rate, we obtained 138 proteins that mFPT predicted differently than the simple naive bayesian method (SNB). In our analysis of these 138 proteins, we present novel predictions for the location for 17 proteins, which currently do not have any defined localization. These predictions can serve as putative annotations and should provide preliminary clues for experimentalists. We also compared our predictions against the eukaryotic subcellular localization database and related predictions by others on protein localization. Our method is quite generalized and can thus be applied to discover the underlying rules for protein-protein interactions, genomic interactions, and structure-function relationships, as well as those of other fields of research.
format Text
id pubmed-3023707
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30237072011-01-31 An FPT Approach for Predicting Protein Localization from Yeast Genomic Data Wang, Jin Li, Chunhe Wang, Erkang Wang, Xidi PLoS One Research Article Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coincidentally, data mining methods been developed and refined in order to handle this experimental windfall, thus allowing the scientific community to quantitatively address long-standing questions such as that of protein localization. Here, we develop a frequent pattern tree (FPT) approach to generate a minimum set of rules (mFPT) for predicting protein localization. We acquire a series of rules according to the features of yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regression under various statistical measures. Our results show that mFPT gave better performance than other approaches in predicting protein localization. Meanwhile, setting 0.65 as the minimum hit-rate, we obtained 138 proteins that mFPT predicted differently than the simple naive bayesian method (SNB). In our analysis of these 138 proteins, we present novel predictions for the location for 17 proteins, which currently do not have any defined localization. These predictions can serve as putative annotations and should provide preliminary clues for experimentalists. We also compared our predictions against the eukaryotic subcellular localization database and related predictions by others on protein localization. Our method is quite generalized and can thus be applied to discover the underlying rules for protein-protein interactions, genomic interactions, and structure-function relationships, as well as those of other fields of research. Public Library of Science 2011-01-19 /pmc/articles/PMC3023707/ /pubmed/21283516 http://dx.doi.org/10.1371/journal.pone.0014449 Text en Wang et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wang, Jin
Li, Chunhe
Wang, Erkang
Wang, Xidi
An FPT Approach for Predicting Protein Localization from Yeast Genomic Data
title An FPT Approach for Predicting Protein Localization from Yeast Genomic Data
title_full An FPT Approach for Predicting Protein Localization from Yeast Genomic Data
title_fullStr An FPT Approach for Predicting Protein Localization from Yeast Genomic Data
title_full_unstemmed An FPT Approach for Predicting Protein Localization from Yeast Genomic Data
title_short An FPT Approach for Predicting Protein Localization from Yeast Genomic Data
title_sort fpt approach for predicting protein localization from yeast genomic data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3023707/
https://www.ncbi.nlm.nih.gov/pubmed/21283516
http://dx.doi.org/10.1371/journal.pone.0014449
work_keys_str_mv AT wangjin anfptapproachforpredictingproteinlocalizationfromyeastgenomicdata
AT lichunhe anfptapproachforpredictingproteinlocalizationfromyeastgenomicdata
AT wangerkang anfptapproachforpredictingproteinlocalizationfromyeastgenomicdata
AT wangxidi anfptapproachforpredictingproteinlocalizationfromyeastgenomicdata
AT wangjin fptapproachforpredictingproteinlocalizationfromyeastgenomicdata
AT lichunhe fptapproachforpredictingproteinlocalizationfromyeastgenomicdata
AT wangerkang fptapproachforpredictingproteinlocalizationfromyeastgenomicdata
AT wangxidi fptapproachforpredictingproteinlocalizationfromyeastgenomicdata