Cargando…

An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction

BACKGROUND: Machine learning approaches for classification learn the pattern of the feature space of different classes, or learn a boundary that separates the feature space into different classes. The features of the data instances are usually available, and it is only the class-labels of the instan...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thahir, Mohamed, Sharma, Tarun, Ganapathiraju, Madhavi K
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504800/ https://www.ncbi.nlm.nih.gov/pubmed/23173746 http://dx.doi.org/10.1186/1753-6561-6-S7-S2

_version_	1782250675163365376
author	Thahir, Mohamed Sharma, Tarun Ganapathiraju, Madhavi K
author_facet	Thahir, Mohamed Sharma, Tarun Ganapathiraju, Madhavi K
author_sort	Thahir, Mohamed
collection	PubMed
description	BACKGROUND: Machine learning approaches for classification learn the pattern of the feature space of different classes, or learn a boundary that separates the feature space into different classes. The features of the data instances are usually available, and it is only the class-labels of the instances that are unavailable. For example, to classify text documents into different topic categories, the words in the documents are features and they are readily available, whereas the topic is what is predicted. However, in some domains obtaining features may be resource-intensive because of which not all features may be available. An example is that of protein-protein interaction prediction, where not only are the labels ('interacting' or 'non-interacting') unavailable, but so are some of the features. It may be possible to obtain at least some of the missing features by carrying out a few experiments as permitted by the available resources. If only a few experiments can be carried out to acquire missing features, which proteins should be studied and which features of those proteins should be determined? From the perspective of machine learning for PPI prediction, it would be desirable that those features be acquired which when used in training the classifier, the accuracy of the classifier is improved the most. That is, the utility of the feature-acquisition is measured in terms of how much acquired features contribute to improving the accuracy of the classifier. Active feature acquisition (AFA) is a strategy to preselect such instance-feature combinations (i.e. protein and experiment combinations) for maximum utility. The goal of AFA is the creation of optimal training set that would result in the best classifier, and not in determining the best classification model itself. RESULTS: We present a heuristic method for active feature acquisition to calculate the utility of acquiring a missing feature. This heuristic takes into account the change in belief of the classification model induced by the acquisition of the feature under consideration. As compared to random selection of proteins on which the experiments are performed and the type of experiment that is performed, the heuristic method reduces the number of experiments to as few as 40%. Most notable characteristic of this method is that it does not require re-training of the classification model on every possible combination of instance, feature and feature-value tuples. For this reason, our method is far less computationally expensive as compared with previous AFA strategies. CONCLUSIONS: The results show that our heuristic method for AFA creates an optimal training set with far less features acquired as compared to random acquisition. This shows the value of active feature acquisition to aid in protein-protein interaction prediction where feature acquisition is costly. Compared to previous methods, the proposed method reduces computational cost while also achieving a better F-score. The proposed method is valuable as it presents a direction to AFA with a far lesser computational expense by removing the need for the first time, of training a classifier for every combination of instance, feature and feature-value tuples which would be impractical for several domains.
format	Online Article Text
id	pubmed-3504800
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35048002012-11-29 An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction Thahir, Mohamed Sharma, Tarun Ganapathiraju, Madhavi K BMC Proc Proceedings BACKGROUND: Machine learning approaches for classification learn the pattern of the feature space of different classes, or learn a boundary that separates the feature space into different classes. The features of the data instances are usually available, and it is only the class-labels of the instances that are unavailable. For example, to classify text documents into different topic categories, the words in the documents are features and they are readily available, whereas the topic is what is predicted. However, in some domains obtaining features may be resource-intensive because of which not all features may be available. An example is that of protein-protein interaction prediction, where not only are the labels ('interacting' or 'non-interacting') unavailable, but so are some of the features. It may be possible to obtain at least some of the missing features by carrying out a few experiments as permitted by the available resources. If only a few experiments can be carried out to acquire missing features, which proteins should be studied and which features of those proteins should be determined? From the perspective of machine learning for PPI prediction, it would be desirable that those features be acquired which when used in training the classifier, the accuracy of the classifier is improved the most. That is, the utility of the feature-acquisition is measured in terms of how much acquired features contribute to improving the accuracy of the classifier. Active feature acquisition (AFA) is a strategy to preselect such instance-feature combinations (i.e. protein and experiment combinations) for maximum utility. The goal of AFA is the creation of optimal training set that would result in the best classifier, and not in determining the best classification model itself. RESULTS: We present a heuristic method for active feature acquisition to calculate the utility of acquiring a missing feature. This heuristic takes into account the change in belief of the classification model induced by the acquisition of the feature under consideration. As compared to random selection of proteins on which the experiments are performed and the type of experiment that is performed, the heuristic method reduces the number of experiments to as few as 40%. Most notable characteristic of this method is that it does not require re-training of the classification model on every possible combination of instance, feature and feature-value tuples. For this reason, our method is far less computationally expensive as compared with previous AFA strategies. CONCLUSIONS: The results show that our heuristic method for AFA creates an optimal training set with far less features acquired as compared to random acquisition. This shows the value of active feature acquisition to aid in protein-protein interaction prediction where feature acquisition is costly. Compared to previous methods, the proposed method reduces computational cost while also achieving a better F-score. The proposed method is valuable as it presents a direction to AFA with a far lesser computational expense by removing the need for the first time, of training a classifier for every combination of instance, feature and feature-value tuples which would be impractical for several domains. BioMed Central 2012-11-13 /pmc/articles/PMC3504800/ /pubmed/23173746 http://dx.doi.org/10.1186/1753-6561-6-S7-S2 Text en Copyright ©2012 Thahir et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Thahir, Mohamed Sharma, Tarun Ganapathiraju, Madhavi K An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title	An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_full	An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_fullStr	An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_full_unstemmed	An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_short	An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
title_sort	efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504800/ https://www.ncbi.nlm.nih.gov/pubmed/23173746 http://dx.doi.org/10.1186/1753-6561-6-S7-S2
work_keys_str_mv	AT thahirmohamed anefficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction AT sharmatarun anefficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction AT ganapathirajumadhavik anefficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction AT thahirmohamed efficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction AT sharmatarun efficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction AT ganapathirajumadhavik efficientheuristicmethodforactivefeatureacquisitionanditsapplicationtoproteinproteininteractionprediction

An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction

Ejemplares similares