Cargando…

Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms

Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Chien-Hung, Peng, Huai-Shun, Ng, Ka-Lok
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381656/
https://www.ncbi.nlm.nih.gov/pubmed/25866773
http://dx.doi.org/10.1155/2015/312047
_version_ 1782364493401030656
author Huang, Chien-Hung
Peng, Huai-Shun
Ng, Ka-Lok
author_facet Huang, Chien-Hung
Peng, Huai-Shun
Ng, Ka-Lok
author_sort Huang, Chien-Hung
collection PubMed
description Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues's method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis.
format Online
Article
Text
id pubmed-4381656
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-43816562015-04-12 Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms Huang, Chien-Hung Peng, Huai-Shun Ng, Ka-Lok Biomed Res Int Research Article Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues's method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis. Hindawi Publishing Corporation 2015 2015-03-17 /pmc/articles/PMC4381656/ /pubmed/25866773 http://dx.doi.org/10.1155/2015/312047 Text en Copyright © 2015 Chien-Hung Huang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Huang, Chien-Hung
Peng, Huai-Shun
Ng, Ka-Lok
Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms
title Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms
title_full Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms
title_fullStr Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms
title_full_unstemmed Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms
title_short Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms
title_sort prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381656/
https://www.ncbi.nlm.nih.gov/pubmed/25866773
http://dx.doi.org/10.1155/2015/312047
work_keys_str_mv AT huangchienhung predictionofcancerproteinsbyintegratingproteininteractiondomainfrequencyanddomaininteractiondatausingmachinelearningalgorithms
AT penghuaishun predictionofcancerproteinsbyintegratingproteininteractiondomainfrequencyanddomaininteractiondatausingmachinelearningalgorithms
AT ngkalok predictionofcancerproteinsbyintegratingproteininteractiondomainfrequencyanddomaininteractiondatausingmachinelearningalgorithms