Privacy preserving rule-based classifier using modified artificial bee colony algorithm
Künye
Zorarpacı, E., Ayşe Özel, S. (2021). Privacy preserving rule-based classifier using modified artificial bee colony algorithm. Expert Systems with Applications, 183, art. no. 115437. https://doi.org/10.1016/j.eswa.2021.115437Özet
Privacy preserving data mining is a hot research field of data mining. The aim of privacy preserving data mining is to prevent the leakage of the sensitive information of individuals while performing data mining techniques. Classification task is one of the most studied fields in data mining hence in privacy preserving data mining as well. On the other hand, differential privacy is a powerful privacy guarantee that determines privacy leakage ratio by using ∊ parameter and enables researchers to mine data which includes sensitive information. Implementations of some well-known classification algorithms such as k-NN, Naïve Bayes, ID3, etc. with differential privacy have been developed. Although the success of the rule-based classifiers using meta-heuristics such as Ant-Miner, BeeMiner etc. in data mining has been demonstrated, any implementation of these classification algorithms with differential privacy has not been proposed in the literature until now to our best knowledge. Artificial bee colony (ABC) is a nature inspired algorithm which imitates foraging behavior of bees, and some approaches using ABC to discover classification rules have been proposed recently and the success of ABC algorithm for the discovery of classification rules has been demonstrated. Motivated by this shortcoming in the literature, we propose to develop a rule-based classifier using ABC algorithm with input perturbation technique of differential privacy to perform privacy preserving classification. According to our experimental results, the proposed ABC-based classifier performs better than the well-known algorithms that are SVM, C4.5, Holte's One Rule, PART, and RIPPER over non-private and differentially private versions of the datasets in terms of classification performance.