您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (3): 10-17, 27.doi: 10.6040/j.issn.1671-9352.2.2018.084

•   • 上一篇    下一篇

社交网络用户敏感属性迭代识别方法

谢小杰1,2(),梁英1,*(),董祥祥1,2   

  1. 1. 中国科学院计算技术研究所泛在计算系统研究中心,北京 100190
    2. 中国科学院大学计算机科学与技术学院,北京 100049
  • 收稿日期:2018-09-20 出版日期:2019-03-20 发布日期:2019-03-19
  • 通讯作者: 梁英 E-mail:mailbox_of_xxj@126.com;liangy@ict.ac.cn
  • 作者简介:谢小杰(1997—),男,硕士研究生,研究方向为数据挖掘. E-mail:mailbox_of_xxj@126.com
  • 基金资助:
    国家重点研发计划(2018YFB1004704);国家重点研发计划(2016YFB0800403)

Sensitive attribute iterative inference method for social network users

Xiao-jie XIE1,2(),Ying LIANG1,*(),Xiang-xiang DONG1,2   

  1. 1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2. School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2018-09-20 Online:2019-03-20 Published:2019-03-19
  • Contact: Ying LIANG E-mail:mailbox_of_xxj@126.com;liangy@ict.ac.cn
  • Supported by:
    国家重点研发计划(2018YFB1004704);国家重点研发计划(2016YFB0800403)

摘要:

分析识别社交网络用户敏感信息,有利于从技术上量化隐私泄露程度,进行隐私保护。针对现有的用户属性识别方法需要对用户属性取值进行强假设的问题,结合RL迭代分类框架和扩展wvRN关系识别的方法,提出了一种社交网络用户敏感属性迭代识别方法。通过卷积神经网络提取用户文本特征进行识别,结合邻居结点迭代地进行关系识别,不仅弱化了对用户属性的假设,而且提高了可用性。实验结果表明,通过在社交网络中获取少量的标注数据,对迭代识别方法设置合理的参数值,可以获得较好的用户敏感属性识别结果。

关键词: 社交网络, 文本分类, 社交链接, 属性识别, 数据挖掘

Abstract:

Analyzing and inferring sensitive information of social network users is conducive to technically quantifying the degree of privacy leakage and protecting privacy. Aiming at the problem that existing user attribute inference methods needs to make strong assumptions on the value of user attributes, an iterative method for user sensitive attributes in social network is proposed by combining the RL iterative classification framework and extending the wvRN relation inference method. Extracting probabilities of user sensitive attributes based on user text and convolution neural network and iteratively updating inference results with neighboring nodes, not only weakens the assumption of user attributes, but also improves the degree of application. The experimental results show that by obtaining a small amount of labeled data in social networks and setting reasonable parameter values for iterative inference methods, better user sensitive attribute inference results can be obtained.

Key words: social network, text classification, social link, attribute inference, data mining

中图分类号: 

  • TP309.2

图1

迭代识别过程"

图2

TextCNN结构图示例"

表1

数据集划分详情"

标注率 训练集用户数 测试集用户数
0.1 3 220 28 983
0.2 6 440 25 763
0.3 9 660 22 543
0.4 12 881 19 322
0.5 16 101 16 102
0.6 22 542 9 661
0.7 22 542 9 661
0.8 25 762 6 441
0.9 28 982 3 221

图3

性别的识别准确率"

图4

省份的识别准确率"

图5

城市的识别准确率"

1 ACQUISTI A, GROSS R. Imagined communities: awareness, information sharing, and privacy on the facebook[M]// ACQUISTI A, GROSS R. eds. Privacy Enhancing Technologies. Berlin: Springer, 2006: 36-58.
2 LI Xiaoxue, CAO Yannan, SHANG Yanmin, et al. Inferring user profiles in online social networks based on convolutional neural network[C]// Proceedings of International Conference on Knowledge Science, Engineering and Management(KSEM). New York: Springer, 2017: 274-286.
3 XU W H, ZHOU X, LI L. Inferring privacy information via social relations[C]//2008 IEEE 24th International Conference on Data Engineering Workshop. Mexico: ICDE, 2008: 525-530.
4 LINDAMOOD J, HEATHERLY R, KANTARCIOGLU M, et al. Inferring private information using social network data[C]// International Conference on World Wide Web. New York: ACM, 2008.
5 JIA Jinyuan, WANG Binghui, ZHANG Le, et al. AttriInfer: inferring user attributes in online social networks using Markov random fields[C]// Proceedings of International World Wide Web Conferences(WWW). New York: ACM, 2017: 1561-1569.
6 DONG Y X, TANG J, WU S, et al. Link prediction and recommendation across heterogeneous social networks[C]// IEEE International Conference on Data Mining. New York: IEEE, 2013.
7 CHAUDHARI G, AVADHANULA V, SARAWAGI S. A few good predictions: selective node labeling in a social network[C]// ACM International Conference on Web Search & Data Mining.[s.l.]: ACM, 2014.
8 ELENA Z, LISE G. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles[C]// Proceedings of International World Wide Web Conferences(WWW). New York: ACM, 2009: 531-540.
9 LARS B, ERIC S, CAMERON M. Find me if you can: improving geographical prediction with social and spatial proximity[C] // Proceedings of International World Wide Web Conferences(WWW). North Carolina: ACM, 2010: 61-70.
10 LI Rui, WANG Chi, CHANG Chenchuan. User profiling in an ego network: co-profiling attributes and relationships[C]// Proceedings of International World Wide Web Conferences(WWW). New Youk: ACM, 2014: 819-830.
11 DOUGNON R Y , FOURNIER-VIGER P , LIN J C W , et al. Inferring social network user profiles using a partial social graph[J]. Journal of Intelligent Information Systems, 2016, 47 (2): 313- 344.
doi: 10.1007/s10844-016-0402-y
12 WONG R K, VIDYALAKSHMI B S. Privacy leakage via attribute inference in directed social networks[C]// Proceedings of International Conference on Information and Communications Security(ICICS). Berlin: Springer. 2016: 333-346.
13 胡开先, 梁英, 许洪波, 等. 一种社会网络用户身份特征识别方法[J]. 计算机研究与发展, 2016, 53 (11): 2630- 2644.
doi: 10.7544/issn1000-1239.2016.20150219
HU Kaixian , LIANG Yin , XU Hongbo , et al. A method for social network user identity feature recognition[J]. Journal of Computer Research and Development, 2016, 53 (11): 2630- 2644.
doi: 10.7544/issn1000-1239.2016.20150219
14 GONG Zhenqiang, LIU Bin. You are who you know and how you behave: attribute inference attacks via users' social friends and behaviors[J]. arXiv: 1606.05893. https: //arxiv.org/abs/1606.05893
15 ABDELBERI C, GERGELY A, MOHAMED A K. You are what you like! Information leakage through users' interests[C]// Proceedings of ISOC Network and Distributed System Security Symposium(NDSS). San Diego: [s.n.], 2012.
16 ALAN M, BIMAL V, KRISHNA P, et al. You are who you know: inferring user profiles in online social networks[C]// Proceedings of ACM International Conference on Web Search and Data Mining(WSDM). New York: ACM, 2010: 251-260.
17 MACSKASSY S A , PROVOST F . Classification in networked data: a toolkit and a univariate case study[J]. Journal of Machine Learning Research, 2007, 8 (3): 1- 41.
18 YOON K. Convolutional neural networks for sentence classification[C] // Proceedings of Conference on Empirical Methods in Natural Language Processing(EMNLP). Doha: ACL, 2014: 1746-1751.
[1] 张中军,张文娟,于来行,李润川. 基于网络距离和内容相似度的微博社交网络社区划分方法[J]. 山东大学学报(理学版), 2017, 52(7): 97-103.
[2] 康海燕,马跃雷. 差分隐私保护在数据挖掘中应用综述[J]. 山东大学学报(理学版), 2017, 52(3): 16-23.
[3] 邓小方,钟元生,吕琳媛,王明文,熊乃学. 融合社交网络的物质扩散推荐算法[J]. 山东大学学报(理学版), 2017, 52(3): 51-59.
[4] 李宇溪,王恺璇,林慕清,周福才. 基于匿名广播加密的P2P社交网络隐私保护系统[J]. 山东大学学报(理学版), 2016, 51(9): 84-91.
[5] 柳欣,徐秋亮,张波. 满足可控关联性的合作群签名方案[J]. 山东大学学报(理学版), 2016, 51(9): 18-35.
[6] 祝升,周斌,朱湘. 综合用户相似性与话题时效性的影响力用户发现算法[J]. 山东大学学报(理学版), 2016, 51(9): 113-120.
[7] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[8] 张少群,魏晶晶,廖祥文,简思远,陈国龙. Twitter中的情绪传染现象[J]. 山东大学学报(理学版), 2016, 51(1): 71-76.
[9] 张凌, 任雪芳. 基数余-亏定理与数据外-内挖掘-分离[J]. 山东大学学报(理学版), 2015, 50(08): 90-94.
[10] 马成龙, 姜亚松, 李艳玲, 张艳, 颜永红. 基于词矢量相似度的短文本分类[J]. 山东大学学报(理学版), 2014, 49(12): 18-22.
[11] 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81.
[12] 吴熙曦, 李炳龙, 张天琪. 基于KNN的Android智能手机微信取证方法[J]. 山东大学学报(理学版), 2014, 49(09): 150-153.
[13] 刘伍颖,易绵竹,张兴. 一种时空高效的多类别文本分类算法[J]. J4, 2013, 48(11): 99-104.
[14] 蒋盛益1,庞观松2,张建军3. 基于聚类的垃圾邮件识别技术研究[J]. J4, 2011, 46(5): 71-76.
[15] 张文东1,尹金焕1,贾晓飞2,黄超1,苑衍梅1. 基于向量的频繁项集挖掘算法研究[J]. J4, 2011, 46(3): 31-34.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 杨永伟1,2,贺鹏飞2,李毅君2,3. BL-代数的严格滤子[J]. 山东大学学报(理学版), 2014, 49(03): 63 -67 .
[2] 裴胜玉,周永权*. 一种基于混沌变异的多目标粒子群优化算法[J]. J4, 2010, 45(7): 18 -23 .
[3] 杜吉祥1,2,余庆1,翟传敏1. 基于稀疏性约束非负矩阵分解的人脸年龄估计方法[J]. J4, 2010, 45(7): 65 -69 .
[4] 薛秋芳1,2,高兴宝1*,刘晓光1. H-矩阵基于外推GaussSeidel迭代法的几个等价条件[J]. J4, 2013, 48(4): 65 -71 .
[5] 王 兵 . 拟无爪图的性质[J]. J4, 2007, 42(10): 111 -113 .
[6] 于少伟. 基于云理论的新的不确定性推理模型研究[J]. J4, 2009, 44(3): 84 -87 .
[7] 王琦,赵红銮 . Split完全图的最小直径定向[J]. J4, 2006, 41(6): 84 -86 .
[8] 曲守宁,付爱芳,李静,刘静. 基于柔性神经树模型的股票市场风险预测[J]. J4, 2009, 44(11): 44 -47 .
[9] 邵国俊,茹淼焱*,孙雪莹. 聚醚接枝聚羧酸系减水剂合成工艺研究[J]. J4, 2013, 48(05): 29 -33 .
[10] 刘昆仑. 变结构pair copula模型在金融危机传染分析中的应用[J]. 山东大学学报(理学版), 2016, 51(6): 104 -110 .