您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (3): 16-23.doi: 10.6040/j.issn.1671-9352.2.2016.053

• • 上一篇    下一篇

差分隐私保护在数据挖掘中应用综述

康海燕1,马跃雷2   

  1. 1.北京信息科技大学信息管理学院, 北京 100192;2.北京信息科技大学计算机学院, 北京 100192
  • 收稿日期:2016-05-28 出版日期:2017-03-20 发布日期:2017-03-20
  • 作者简介:康海燕(1971— ),男,博士,教授,研究方向为网络安全与隐私保护. E-mail:kanghaiyan@126.com
  • 基金资助:
    国家自然科学基金资助项目(61370139);北京市社会科学基金项目(15JGB099);北京市优秀人才培养资助项目(2013E005007000001)

Survey on application of data mining via differential privacy

KANG Hai-yan1, MA Yue-lei2   

  1. 1. School of Information Management, Beijing Information Science and Technology University, Beijing 100192, China;
    2. School of Computer Science, Beijing Information Science and Technology University, Beijing 100192, China
  • Received:2016-05-28 Online:2017-03-20 Published:2017-03-20

摘要: 针对差分隐私在数据挖掘中的最新成果进行了研究,介绍了差分隐私保护的定义和实现机制,分析了差分隐私在模式挖掘、分类和聚类中的相关研究,着重解析了部分重要技术的实现原理,对比分析了其优缺点和算法复杂度。最后,展望了差分隐私在动态数据发布和大数据环境下的研究方向。

关键词: 隐私保护, 信息安全, 差分隐私, 数据挖掘

Abstract: The latest results of differential privacy in data mining were surveyed. The basic concepts of differential privacy were introduced. It analyzes the differential privacys research in pattern mining, classification and cluster. It was focused that on the analysis of the principle of some important technology to achieve. And also it was made that comparative analysis of its strengths and weaknesses and algorithm complexity. Finally, the future research of difference privacy under the dynamic data publication and big data environments was discussed.

Key words: differential privacy, privacy preserving, information security, data mining

中图分类号: 

  • TP309
[1] BACKSTROM L, DWORK C, KLEINBERG J. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography[C] // Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 181-190.
[2] HOMER N, SZELINGER S, REDMAN M, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays[J]. Plos Genetics, 2008, 4(8):e1000167.
[3] MACHANAVAJJHALA A, KOROLOVA A, SARMA A D. Personalized social recommendations: accurate or private[C] // Proceedings of the VLDB Endowment VLDB. Endowment Hompage Archive PVLDB, 2011, 4(7):440-450.
[4] NARAYANAN A, SHMATIKOV V. Robust de-anonymization of large sparse datasets[C] // Proceedings of the 2008 IEEE Symposium on Security and Privacy. Washington: IEEE, 2008: 111-125.
[5] WANG R, LI Y F, WANG X F, et al. Learning your identity and disease from research papers: information leaks in genome wide association study[C] // Proceedings of the 16th ACM Conference on Computer and Communications Security. New York: ACM, 2009: 534-544.
[6] 周水庚, 李丰, 陶宇飞, 等. 面向数据库应用的隐私保护研究综述[J]. 计算机学报, 2009,32(5):847-861. ZHOU Shuigeng, LI Feng, TAO Yufei, et al. Privacy preservation in database applications: a Survey[J]. Chinese Journal of Computers, 2009, 32(5):847-861.
[7] EVFIMIEVSKI A, SRIKANT R, AGRAL A, et al. Privacy preserving mining of association rules[C] // Proceedings of the ACM Sigkdd International Conference on Knowledge Discovery and Data Mining(SIGKDD). New York: ACM, 2002: 217-228.
[8] RUBINSTEIN B I P, BARTLETT P L, LING H, et al. Learning in a large function space: privacy-preserving mechanisms for SVM learning[J]. Eprint Arxiv, 2009, 4.
[9] NISSIM K, RASKHODNIKOVA S, SMITH A. Smooth sensitivity and sampling in private data analysis[C] // Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing. New York: ACM, 2007: 75-84.
[10] FRIEDMAN A, SCHUSTER A. Data mining with differential privacy[C] // Proceedings of the 16th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 493-502.
[11] MOHAMMED N, CHEN R, FUNG B C M, et al. Differentially private data release for data mining[C] // Proceedings of the 17th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2011: 493-501.
[12] BHASKAR R, LAXMAN S, SMITH A, et al. Discovering frequent patterns in sensitive data[C] // Proceedings of the 16th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 503-512.
[13] DING B, WINSLETT M, HAN J, et al. Differentially private data cubes: optimizing noise sources and consistency[C] // Proceedings of the 2011 ACM Sigmod International Conference on Management of Data. New York: ACM, 2011: 217-228.
[14] CHAUDHURI K, MONTELEONI C. Privacy-preserving logistic regression[J]. Advances in Neural Information Processing Systems, 2008: 289-296.
[15] KAMALIKA CHAUDHURI, CLAIRE MONTELEONI, ANAND D S. Differentially private empirical risk minimization[J]. Computer Science, 2011, 12(2):1069-1109.
[16] ADAM N R, WORTMAN J C. Security-control methods for statistical databases[C] // A Comparison Study. ACM Computing Surveys, 1989, 21(4):515-556.
[17] FIENBERG S E, MCINTYRE J. Data swapping: variations on a theme by dalenius and reiss[C] // Proceedings of Privacy in Statistical Databases: Casc Project International Workshop. Spain: Springer, 2004: 38-42.
[18] PINKAS B. Cryptographic techniques for privacy-preserving data mining[J]. Acm Sigkdd Explorations Newsletter, 2003, 4(2):12-19.
[19] AGRAWAL R, SRIKANT R. Privacy preserving data mining[C] // Proceedings of the 2000 ACM Sigmod International Conference on Management of Data. New York: Springer, 2000, 29(2):439-450.
[20] SWEENEY L. k-anonymity: a model for protecting privacy[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2012, 10(5):557-570.
[21] MACHANNAVAJJHALA A, GEHRKE J, KIFER D, et al. l-diversity: privacy beyond k-anonymity[C] // Proceedings of the International Conference on Data Engineering. IEEE, 2006: 24-24.
[22] LI N, LI T, VENKATASUBRAMANIAN S. t-closeness: privacy beyond k-anonymity and l-diversity[C] // Proceedings of the IEEE International Conference on Data Engineering(ICDE). Istanbul: IEEE, 2007: 106-115.
[23] WONG C W, LI J, FU W C, et al. (a-k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing[C] // Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006: 754-759.
[24] XIAO X, TAO Y. m-invariance: towards privacy preserving re-publication of dynamic datasets[C] // Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2007: 689-700.
[25] 熊平, 朱天清, 王晓峰. 差分隐私保护及其应用[J]. 计算机学报, 2014,1(37):101-122. XIONG Ping, ZHU Tianqing, WANG Xiaofeng. A survey on differential privacy and applications[J]. Chinese Journal of Computers, 2014,1(37):101-122.
[26] DWORK C. Differential privacy[C] // Proceedings of the 33rd International Conference on Automata, Languages and Programming-Volume Part II. Berlin: Springer-Verlag, 2006: 1-12.
[27] DWORK C. Differential privacy: a survey of results[C] // Proceedings of the 5th International Conference on Theory and Applications of Models of Computation. Berlin: Springer-Verlag, 2008: 1-19.
[28] NISSIM K, RASKHODNIKOVA S, SMITH A. Smooth sensitivity and sampling in private data analysis[C] // Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing. New York: ACM, 2007: 75-84.
[29] DWORK C, MCSHERRY F, NISSIM K. Calibrating noise to sensitivity in private data analysis[C] // Proceedings of the Third Conference on Theory of Cryptography. Berlin: Springer-Verlag, 2006: 265-284.
[30] MCSHERRY F, TALWAR K. Mechanism design via differential privacy[C] // Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science(FOCS). Washington: IEEE, 2007: 94-103.
[31] BHASKAR R, LAXMAN S, SMITH A, et al. Discovering frequent patterns in sensitive data[C] // Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 503-512.
[32] LI N, QARDAJA W, SU D, et al. PrivBasis: frequent itemset mining with differential privacy[C] // Proceedings of the VLDB Endowment VLDB Endowment Hompage Archive. New York: ACM, 2012: 1340-1351.
[33] 张啸剑, 王淼, 孟小峰. 差分隐私保护下一种精确挖掘top-k频繁模式方法[J]. 计算机研究与发展,2014,51(1):104-114. ZHANG Xiaojian, WANG Miao, MENG Xiaofeng, et al. An accurate method for mining top-k frequent pattern under differential privacy[J]. Journal of Computer Research and Development, 2014, 51(1):104-114.
[34] HAN J, PEI J, YIN Y, et al. Mining frequent patterns without candidate generation: a frequent-pattern tree approach[J]. Data Mining and Knowledge Discovery, 2004, 8(1):53-87.
[35] HAN X, WANG M, ZHANG X, et al. Differentially private top-k query over mapreduce[C] // Proceedings of the Forth International Workshop on Cloud Data Management. New York: ACM, 2012: 25-32.
[36] XIANG C, SEN S, SHENG Z X, et al. DP-Apriori: a differentially private frequent itemset mining algorithm based on transaction splitting[J]. Computers and Security, 2015, 50:74-90.
[37] BLUM A, DWORK C, MCSHERRY F, et al. Practiacal privacy: the sulq framework[C] // Proceedings of the 24th ACM Sigact-Sigmod-Sigart Symposium on Principles of Databases Systems(PODS). New York: ACM, 2005: 128-138.
[38] MCSHERRY F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis[C] // Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2009: 10-30.
[39] 赵格, 康海燕. 基于差分隐私的电子商务隐私保护数据发布[J]. 物流工程与管理, 2014, 36(11):119-122. ZHAO Ge, KANG Haiyan. E-commerce privacy protection data release based on differential privacy[J]. Logistics Engineering and Management, 2014, 36(11):119-122.
[40] MOHAMMED N, CHEN R, FUNG B C M, et al. Differentially private data release for data mining[C] // Proceedings of the 17th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2011: 493-501.
[41] ZHU T, XIONG P, XIANG Y, et al. An effective differentially private data releasing algorithm for decision tree[C] // Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. Washington: IEEE Computer Society, 2013: 388-395.
[42] JAGANNATHAN G, POLLAOPAKKAMNATT K, WRIGHT R N. A practical differentially private random decision tree classifier[J]. Transaction on Data Privacy, 2012, 5(1):273-295.
[43] VAIDYA J, SHAFIQ B, FAN W, et al. A random decision tree framework for privacy preserving data mining[J]. IEEE Transactions on Dependable and Secure Computing, 2014, 11(5):399-411.
[44] PATIL A, SINGH S. Differential private random forest[C] // International Conference on. Advances in Computing, Communications and Informatics, 2014: 2623-2630.
[45] BOJARSKI M, CHOROMANSKA A, CHOROMANSKI K, et al. Differentially-and non-differentially-private random decision trees[J]. Eprint Arxiv, 2014.
[46] FELDMAN D, FIAT A, KAPLAN H, et al. Private coresets[C] // Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing. New York: ACM, 2009: 361-370.
[47] DWORK C. A firm foundation for private data analysis[J]. Communications of the ACM, 2011, 54(1):86-95.
[48] 李杨, 郝志峰, 温雯,等. 差分隐私保护k-means聚类方法研究[J]. 计算机科学, 2013, 40(3):287-290. LI Yang, HAO Zhifeng, WEN Wen, et al. Research on differential privacy preserving k-means clustering[J]. Computer Science, 2013, 40(3):287-290.
[49] 张啸剑, 孟晓峰. 面向数据发布和分析的差分隐私保护[J]. 计算机学报, 2014, 37(4):927-949. ZHANG Xiaojian, MENG Xiaofeng. Differential privacy in data publiction and analysis[J]. Chinese Journal of Computers, 2014, 37(4):927-949.
[50] FAN LIYUE, XIONG Li. An adaptive approach to real-time aggregate monitoring with differential privacy[J]. IEEE Transactions on Knowledge & Data Engineering, 2013, 26(9):2094-2106.
[51] CHAN T H H, SHI E, SONG D. Private and continual release of statistics[J]. ACM Transactions on Information & System Security, 2010, 14(3):405-417.
[52] 冯登国,张敏,李昊.大数据安全与隐私保护.计算机学报,2014,1(37):246-258. FENG Dengguo, ZHANG Min, LI Hao. Big data security and privacy protection[J]. Chinese Journal of Computers, 2014, 1(37):246-258.
[53] SHRIVASTVA KMP, RIZVI MA, SINGH S. Big data privacy based on differential privacy a hope for big data[C] // Proceedings of the 6th International Conference on Computational Intelligence & Communication Networks. India: Bhopal, 2014: 776-781.
[1] 晏燕,郝晓弘. 差分隐私密度自适应网格划分发布方法[J]. 山东大学学报(理学版), 2018, 53(9): 12-22.
[2] 李艳平,齐艳姣,张凯,魏旭光. 支持用户撤销的多授权机构的属性加密方案[J]. 山东大学学报(理学版), 2018, 53(7): 75-84.
[3] 康海燕,黄渝轩,陈楚翘. 基于视频分析的地理信息隐私保护方法[J]. 山东大学学报(理学版), 2018, 53(1): 19-29.
[4] 丁义涛,杨海滨,杨晓元,周潭平. 一种同态密文域可逆隐藏方案[J]. 山东大学学报(理学版), 2017, 52(7): 104-110.
[5] 毕晓迪,梁英,史红周,田辉. 一种基于隐私偏好的二次匿名位置隐私保护方法[J]. 山东大学学报(理学版), 2017, 52(5): 75-84.
[6] 李宇溪,王恺璇,林慕清,周福才. 基于匿名广播加密的P2P社交网络隐私保护系统[J]. 山东大学学报(理学版), 2016, 51(9): 84-91.
[7] 柳欣,徐秋亮,张波. 满足可控关联性的合作群签名方案[J]. 山东大学学报(理学版), 2016, 51(9): 18-35.
[8] 查明明,王伟. FlowMonitor: Android隐私数据流向监控防护系统[J]. 山东大学学报(理学版), 2016, 51(9): 59-67.
[9] 蔡红云,马晓雪. 在线社会网络中基于关系强度的访问控制机制[J]. 山东大学学报(理学版), 2016, 51(7): 90-97.
[10] 吴志军,沈丹丹. 基于信息综合集成共享的下一代网络化全球航班追踪体系结构及关键技术[J]. 山东大学学报(理学版), 2016, 51(11): 1-6.
[11] 张凌, 任雪芳. 基数余-亏定理与数据外-内挖掘-分离[J]. 山东大学学报(理学版), 2015, 50(08): 90-94.
[12] 张晶, 薛冷, 崔毅, 容会, 王剑平. 基于无线传感器网络的双混沌数据加密算法建模与评价[J]. 山东大学学报(理学版), 2015, 50(03): 1-5.
[13] 蔡红云, 田俊峰. 云计算中的数据隐私保护研究[J]. 山东大学学报(理学版), 2014, 49(09): 83-89.
[14] 杨松涛, 马春光, 周长利, 张宗利. 一种地理围栏服务中的LBS隐私保护方法[J]. 山东大学学报(理学版), 2014, 49(09): 69-73.
[15] 吴熙曦, 李炳龙, 张天琪. 基于KNN的Android智能手机微信取证方法[J]. 山东大学学报(理学版), 2014, 49(09): 150-153.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!