Cargando…

An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction

BACKGROUND: Machine learning approaches for classification learn the pattern of the feature space of different classes, or learn a boundary that separates the feature space into different classes. The features of the data instances are usually available, and it is only the class-labels of the instan...

Descripción completa

Detalles Bibliográficos
Autores principales: Thahir, Mohamed, Sharma, Tarun, Ganapathiraju, Madhavi K
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504800/
https://www.ncbi.nlm.nih.gov/pubmed/23173746
http://dx.doi.org/10.1186/1753-6561-6-S7-S2
_version_ 1782250675163365376
author Thahir, Mohamed
Sharma, Tarun
Ganapathiraju, Madhavi K
author_facet Thahir, Mohamed
Sharma, Tarun
Ganapathiraju, Madhavi K
author_sort Thahir, Mohamed
collection PubMed
description BACKGROUND: Machine learning approaches for classification learn the pattern of the feature space of different classes, or learn a boundary that separates the feature space into different classes. The features of the data instances are usually available, and it is only the class-labels of the instances that are unavailable. For example, to classify text documents into different topic categories, the words in the documents are features and they are readily available, whereas the topic is what is predicted. However, in some domains obtaining features may be resource-intensive because of which not all features may be available. An example is that of protein-protein interaction prediction, where not only are the labels ('interacting' or 'non-interacting') unavailable, but so are some of the features. It may be possible to obtain at least some of the missing features by carrying out a few experiments as permitted by the available resources. If only a few experiments can be carried out to acquire missing features, which proteins should be studied and which features of those proteins should be determined? From the perspective of machine learning for PPI prediction, it would be desirable that those features be acquired which when used in training the classifier, the accuracy of the classifier is improved the most. That is, the utility of the feature-acquisition is measured in terms of how much acquired features contribute to improving the accuracy of the classifier. Active feature acquisition (AFA) is a strategy to preselect such instance-feature combinations (i.e. protein and experiment combinations) for maximum utility. The goal of AFA is the creation of optimal training set that would result in the best classifier, and not in determining the best classification model itself. RESULTS: We present a heuristic method for active feature acquisition to calculate the utility of acquiring a missing feature. This heuristic takes into account the change in belief of the classification model induced by the acquisition of the feature under consideration. As compared to random selection of proteins on which the experiments are performed and the type of experiment that is performed, the heuristic method reduces the number of experiments to as few as 40%. Most notable characteristic of this method is that it does not require re-training of the classification model on every possible combination of instance, feature and feature-value tuples. For this reason, our method is far less computationally expensive as compared with previous AFA strategies. CONCLUSIONS: The results show that our heuristic method for AFA creates an optimal training set with far less features acquired as compared to random acquisition. This shows the value of active feature acquisition to aid in protein-protein interaction prediction where feature acquisition is costly. Compared to previous methods, the proposed method reduces computational cost while also achieving a better F-score. The proposed method is valuable as it presents a direction to AFA with a far lesser computational expense by removing the need for the first time, of training a classifier for every combination of instance, feature and feature-value tuples which would be impractical for several domains.
format Online
Article
Text
id pubmed-3504800
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35048002012-11-29 An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction Thahir, Mohamed Sharma, Tarun Ganapathiraju, Madhavi K BMC Proc Proceedings BACKGROUND: Machine learning approaches for classification learn the pattern of the feature space of different classes, or learn a boundary that separates the feature space into different classes. The features of the data instances are usually available, and it is only the class-labels of the instances that are unavailable. For example, to classify text documents into different topic categories, the words in the documents are features and they are readily available, whereas the topic is what is predicted. However, in some domains obtaining features may be resource-intensive because of which not all features may be available. An example is that of protein-protein interaction prediction, where not only are the labels ('interacting' or 'non-interacting') unavailable, but so are some of the features. It may be possible to obtain at least some of the missing features by carrying out a few experiments as permitted by the available resources. If only a few experiments can be carried out to acquire missing features, which proteins should be studied and which features of those proteins should be determined? From the perspective of machine learning for PPI prediction, it would be desirable that those features be acquired which when used in training the classifier, the accuracy of the classifier is improved the most. That is, the utility of the feature-acquisition is measured in terms of how much acquired features contribute to improving the accuracy of the classifier. Active feature acquisition (AFA) is a strategy to preselect such instance-feature combinations (i.e. protein and experiment combinations) for maximum utility. The goal of AFA is the creation of optimal training set that would result in the best classifier, and not in determining the best classification model itself. RESULTS: We present a heuristic method for active feature acquisition to calculate the utility of acquiring a missing feature. This heuristic takes into account the change in belief of the classification model induced by the acquisition of the feature under consideration. As compared to random selection of proteins on which the experiments are performed and the type of experiment that is performed, the heuristic method reduces the number of experiments to as few as 40%. Most notable characteristic of this method is that it does not require re-training of the classification model on every possible combination of instance, feature and feature-value tuples. For this reason, our method is far less computationally expensive as compared with previous AFA strategies. CONCLUSIONS: The results show that our heuristic method for AFA creates an optimal training set with far less features acquired as compared to random acquisition. This shows the value of active feature acquisition to aid in protein-protein interaction prediction where feature acquisition is costly. Compared to previous methods, the proposed method reduces computational cost while also achieving a better F-score. The proposed method is valuable as it presents a direction to AFA with a far lesser computational expense by removing the need for the first time, of training a classifier for every combination of instance, feature and feature-value tuples which would be impractical for several domains. BioMed Central 2012-11-13 /pmc/articles/PMC3504800/ /pubmed/23173746 http://dx.doi.org/10.1186/1753-6561-6-S7-S2 Text en Copyright ©2012 Thahir et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Thahir, Mohamed
Sharma, Tarun
Ganapathiraju, Madhavi K
An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_full An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_fullStr An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_full_unstemmed An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_short An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_sort efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504800/
https://www.ncbi.nlm.nih.gov/pubmed/23173746
http://dx.doi.org/10.1186/1753-6561-6-S7-S2
work_keys_str_mv AT thahirmohamed anefficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction
AT sharmatarun anefficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction
AT ganapathirajumadhavik anefficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction
AT thahirmohamed efficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction
AT sharmatarun efficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction
AT ganapathirajumadhavik efficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction