JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2017, Vol. 52 ›› Issue (3): 16-23.doi: 10.6040/j.issn.1671-9352.2.2016.053

Previous Articles     Next Articles

Survey on application of data mining via differential privacy

KANG Hai-yan1, MA Yue-lei2   

  1. 1. School of Information Management, Beijing Information Science and Technology University, Beijing 100192, China;
    2. School of Computer Science, Beijing Information Science and Technology University, Beijing 100192, China
  • Received:2016-05-28 Online:2017-03-20 Published:2017-03-20

Abstract: The latest results of differential privacy in data mining were surveyed. The basic concepts of differential privacy were introduced. It analyzes the differential privacys research in pattern mining, classification and cluster. It was focused that on the analysis of the principle of some important technology to achieve. And also it was made that comparative analysis of its strengths and weaknesses and algorithm complexity. Finally, the future research of difference privacy under the dynamic data publication and big data environments was discussed.

Key words: differential privacy, privacy preserving, information security, data mining

CLC Number: 

  • TP309
[1] BACKSTROM L, DWORK C, KLEINBERG J. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography[C] // Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 181-190.
[2] HOMER N, SZELINGER S, REDMAN M, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays[J]. Plos Genetics, 2008, 4(8):e1000167.
[3] MACHANAVAJJHALA A, KOROLOVA A, SARMA A D. Personalized social recommendations: accurate or private[C] // Proceedings of the VLDB Endowment VLDB. Endowment Hompage Archive PVLDB, 2011, 4(7):440-450.
[4] NARAYANAN A, SHMATIKOV V. Robust de-anonymization of large sparse datasets[C] // Proceedings of the 2008 IEEE Symposium on Security and Privacy. Washington: IEEE, 2008: 111-125.
[5] WANG R, LI Y F, WANG X F, et al. Learning your identity and disease from research papers: information leaks in genome wide association study[C] // Proceedings of the 16th ACM Conference on Computer and Communications Security. New York: ACM, 2009: 534-544.
[6] 周水庚, 李丰, 陶宇飞, 等. 面向数据库应用的隐私保护研究综述[J]. 计算机学报, 2009,32(5):847-861. ZHOU Shuigeng, LI Feng, TAO Yufei, et al. Privacy preservation in database applications: a Survey[J]. Chinese Journal of Computers, 2009, 32(5):847-861.
[7] EVFIMIEVSKI A, SRIKANT R, AGRAL A, et al. Privacy preserving mining of association rules[C] // Proceedings of the ACM Sigkdd International Conference on Knowledge Discovery and Data Mining(SIGKDD). New York: ACM, 2002: 217-228.
[8] RUBINSTEIN B I P, BARTLETT P L, LING H, et al. Learning in a large function space: privacy-preserving mechanisms for SVM learning[J]. Eprint Arxiv, 2009, 4.
[9] NISSIM K, RASKHODNIKOVA S, SMITH A. Smooth sensitivity and sampling in private data analysis[C] // Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing. New York: ACM, 2007: 75-84.
[10] FRIEDMAN A, SCHUSTER A. Data mining with differential privacy[C] // Proceedings of the 16th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 493-502.
[11] MOHAMMED N, CHEN R, FUNG B C M, et al. Differentially private data release for data mining[C] // Proceedings of the 17th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2011: 493-501.
[12] BHASKAR R, LAXMAN S, SMITH A, et al. Discovering frequent patterns in sensitive data[C] // Proceedings of the 16th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 503-512.
[13] DING B, WINSLETT M, HAN J, et al. Differentially private data cubes: optimizing noise sources and consistency[C] // Proceedings of the 2011 ACM Sigmod International Conference on Management of Data. New York: ACM, 2011: 217-228.
[14] CHAUDHURI K, MONTELEONI C. Privacy-preserving logistic regression[J]. Advances in Neural Information Processing Systems, 2008: 289-296.
[15] KAMALIKA CHAUDHURI, CLAIRE MONTELEONI, ANAND D S. Differentially private empirical risk minimization[J]. Computer Science, 2011, 12(2):1069-1109.
[16] ADAM N R, WORTMAN J C. Security-control methods for statistical databases[C] // A Comparison Study. ACM Computing Surveys, 1989, 21(4):515-556.
[17] FIENBERG S E, MCINTYRE J. Data swapping: variations on a theme by dalenius and reiss[C] // Proceedings of Privacy in Statistical Databases: Casc Project International Workshop. Spain: Springer, 2004: 38-42.
[18] PINKAS B. Cryptographic techniques for privacy-preserving data mining[J]. Acm Sigkdd Explorations Newsletter, 2003, 4(2):12-19.
[19] AGRAWAL R, SRIKANT R. Privacy preserving data mining[C] // Proceedings of the 2000 ACM Sigmod International Conference on Management of Data. New York: Springer, 2000, 29(2):439-450.
[20] SWEENEY L. k-anonymity: a model for protecting privacy[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2012, 10(5):557-570.
[21] MACHANNAVAJJHALA A, GEHRKE J, KIFER D, et al. l-diversity: privacy beyond k-anonymity[C] // Proceedings of the International Conference on Data Engineering. IEEE, 2006: 24-24.
[22] LI N, LI T, VENKATASUBRAMANIAN S. t-closeness: privacy beyond k-anonymity and l-diversity[C] // Proceedings of the IEEE International Conference on Data Engineering(ICDE). Istanbul: IEEE, 2007: 106-115.
[23] WONG C W, LI J, FU W C, et al. (a-k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing[C] // Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006: 754-759.
[24] XIAO X, TAO Y. m-invariance: towards privacy preserving re-publication of dynamic datasets[C] // Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2007: 689-700.
[25] 熊平, 朱天清, 王晓峰. 差分隐私保护及其应用[J]. 计算机学报, 2014,1(37):101-122. XIONG Ping, ZHU Tianqing, WANG Xiaofeng. A survey on differential privacy and applications[J]. Chinese Journal of Computers, 2014,1(37):101-122.
[26] DWORK C. Differential privacy[C] // Proceedings of the 33rd International Conference on Automata, Languages and Programming-Volume Part II. Berlin: Springer-Verlag, 2006: 1-12.
[27] DWORK C. Differential privacy: a survey of results[C] // Proceedings of the 5th International Conference on Theory and Applications of Models of Computation. Berlin: Springer-Verlag, 2008: 1-19.
[28] NISSIM K, RASKHODNIKOVA S, SMITH A. Smooth sensitivity and sampling in private data analysis[C] // Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing. New York: ACM, 2007: 75-84.
[29] DWORK C, MCSHERRY F, NISSIM K. Calibrating noise to sensitivity in private data analysis[C] // Proceedings of the Third Conference on Theory of Cryptography. Berlin: Springer-Verlag, 2006: 265-284.
[30] MCSHERRY F, TALWAR K. Mechanism design via differential privacy[C] // Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science(FOCS). Washington: IEEE, 2007: 94-103.
[31] BHASKAR R, LAXMAN S, SMITH A, et al. Discovering frequent patterns in sensitive data[C] // Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 503-512.
[32] LI N, QARDAJA W, SU D, et al. PrivBasis: frequent itemset mining with differential privacy[C] // Proceedings of the VLDB Endowment VLDB Endowment Hompage Archive. New York: ACM, 2012: 1340-1351.
[33] 张啸剑, 王淼, 孟小峰. 差分隐私保护下一种精确挖掘top-k频繁模式方法[J]. 计算机研究与发展,2014,51(1):104-114. ZHANG Xiaojian, WANG Miao, MENG Xiaofeng, et al. An accurate method for mining top-k frequent pattern under differential privacy[J]. Journal of Computer Research and Development, 2014, 51(1):104-114.
[34] HAN J, PEI J, YIN Y, et al. Mining frequent patterns without candidate generation: a frequent-pattern tree approach[J]. Data Mining and Knowledge Discovery, 2004, 8(1):53-87.
[35] HAN X, WANG M, ZHANG X, et al. Differentially private top-k query over mapreduce[C] // Proceedings of the Forth International Workshop on Cloud Data Management. New York: ACM, 2012: 25-32.
[36] XIANG C, SEN S, SHENG Z X, et al. DP-Apriori: a differentially private frequent itemset mining algorithm based on transaction splitting[J]. Computers and Security, 2015, 50:74-90.
[37] BLUM A, DWORK C, MCSHERRY F, et al. Practiacal privacy: the sulq framework[C] // Proceedings of the 24th ACM Sigact-Sigmod-Sigart Symposium on Principles of Databases Systems(PODS). New York: ACM, 2005: 128-138.
[38] MCSHERRY F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis[C] // Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2009: 10-30.
[39] 赵格, 康海燕. 基于差分隐私的电子商务隐私保护数据发布[J]. 物流工程与管理, 2014, 36(11):119-122. ZHAO Ge, KANG Haiyan. E-commerce privacy protection data release based on differential privacy[J]. Logistics Engineering and Management, 2014, 36(11):119-122.
[40] MOHAMMED N, CHEN R, FUNG B C M, et al. Differentially private data release for data mining[C] // Proceedings of the 17th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2011: 493-501.
[41] ZHU T, XIONG P, XIANG Y, et al. An effective differentially private data releasing algorithm for decision tree[C] // Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. Washington: IEEE Computer Society, 2013: 388-395.
[42] JAGANNATHAN G, POLLAOPAKKAMNATT K, WRIGHT R N. A practical differentially private random decision tree classifier[J]. Transaction on Data Privacy, 2012, 5(1):273-295.
[43] VAIDYA J, SHAFIQ B, FAN W, et al. A random decision tree framework for privacy preserving data mining[J]. IEEE Transactions on Dependable and Secure Computing, 2014, 11(5):399-411.
[44] PATIL A, SINGH S. Differential private random forest[C] // International Conference on. Advances in Computing, Communications and Informatics, 2014: 2623-2630.
[45] BOJARSKI M, CHOROMANSKA A, CHOROMANSKI K, et al. Differentially-and non-differentially-private random decision trees[J]. Eprint Arxiv, 2014.
[46] FELDMAN D, FIAT A, KAPLAN H, et al. Private coresets[C] // Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing. New York: ACM, 2009: 361-370.
[47] DWORK C. A firm foundation for private data analysis[J]. Communications of the ACM, 2011, 54(1):86-95.
[48] 李杨, 郝志峰, 温雯,等. 差分隐私保护k-means聚类方法研究[J]. 计算机科学, 2013, 40(3):287-290. LI Yang, HAO Zhifeng, WEN Wen, et al. Research on differential privacy preserving k-means clustering[J]. Computer Science, 2013, 40(3):287-290.
[49] 张啸剑, 孟晓峰. 面向数据发布和分析的差分隐私保护[J]. 计算机学报, 2014, 37(4):927-949. ZHANG Xiaojian, MENG Xiaofeng. Differential privacy in data publiction and analysis[J]. Chinese Journal of Computers, 2014, 37(4):927-949.
[50] FAN LIYUE, XIONG Li. An adaptive approach to real-time aggregate monitoring with differential privacy[J]. IEEE Transactions on Knowledge & Data Engineering, 2013, 26(9):2094-2106.
[51] CHAN T H H, SHI E, SONG D. Private and continual release of statistics[J]. ACM Transactions on Information & System Security, 2010, 14(3):405-417.
[52] 冯登国,张敏,李昊.大数据安全与隐私保护.计算机学报,2014,1(37):246-258. FENG Dengguo, ZHANG Min, LI Hao. Big data security and privacy protection[J]. Chinese Journal of Computers, 2014, 1(37):246-258.
[53] SHRIVASTVA KMP, RIZVI MA, SINGH S. Big data privacy based on differential privacy a hope for big data[C] // Proceedings of the 6th International Conference on Computational Intelligence & Communication Networks. India: Bhopal, 2014: 776-781.
[1] YAN Yan, HAO Xiao-hong. Differential privacy partitioning algorithm based on adaptive density grids [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 12-22.
[2] LI Yan-ping, QI Yan-jiao, ZHANG Kai, WEI Xu-guang. Multi-authority and revocable attribute-based encryption scheme [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(7): 75-84.
[3] DING Yi-tao, YANG Hai-bin, YANG Xiao-yuan, ZHOU Tan-ping. A reversible image data hiding scheme in Homomorphic encrypted domain [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(7): 104-110.
[4] BI Xiao-di, LIANG Ying, SHI Hong-zhou, TIAN Hui. Aparameterized location privacy protection method based on two-level Anonymity [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(5): 75-84.
[5] LIU Xin, XU Qiu-liang, ZHANG Bo. Cooperative group signature scheme with controllable linkability [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 18-35.
[6] GUO Hua-long, REN Xue-fang, ZHANG Ling. Relationships between dynamic data mining and P-augmented matrix [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(8): 105-110.
[7] REN Xue-fang, ZHANG Ling. Perturbation theorems of inverse P-sets and perturbation-based data mining [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(12): 54-60.
[8] WU Zhi-jun,SHEN Dan-dan. Architecture and key technologies of network-enabled next generation global flight tracking based on information integration and sharing [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 1-6.
[9] ZHANG Ling, REN Xue-fang. Surplus-deficient theorem of cardinal number and data internal-outer mining-separation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(08): 90-94.
[10] ZHANG Jing, XUE Leng, CUI Yi, RONG Hui, WANG Jian-ping. Modeling and evaluation of a dual chaotic encryption algorithm for WSN [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(03): 1-5.
[11] KANG Hai-yan, YANG Kong-yu, CHEN Jian-ming. A method of personalized privacy preservation based on K-anonymization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 142-149.
[12] HUANG Jing-wen. uzzy group decision making for information security risk factors analysis [J]. J4, 2012, 47(11): 45-49.
[13] ZHANG Wen-dong1, YIN Jin-huan1, JIA Xiao-fei2, HUANG Chao1, YUAN Yan-mei1. Research of a frequent itemsets mining algorithm based on vector [J]. J4, 2011, 46(3): 31-34.
[14] YE Ming-quan1,2, HU Xue-gang1, WU Chang-rong3. Privacy preserving attribute reduction based on conditional information entropy over vertically partitioned multi-decision tables [J]. J4, 2010, 45(9): 14-19.
[15] SHU Guo-Gong, DAN Bing, GENG Xiao-Na. A clustering algorithm based on feature point selection [J]. J4, 2009, 44(9): 40-42.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!