您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2015, Vol. 50 ›› Issue (09): 36-41.doi: 10.6040/j.issn.1671-9352.3.2014.287

• 论文 • 上一篇    下一篇

基于有指导LDA用户兴趣模型的微博主题挖掘

王立人1,2, 余正涛1,2, 王炎冰1,2, 高盛祥1,2, 李贤慧1,2   

  1. 1. 昆明理工大学信息工程与自动化学院, 云南 昆明 650500;
    2. 昆明理工大学智能信息处理重点实验室, 云南 昆明 650500
  • 收稿日期:2015-03-03 修回日期:2015-07-22 出版日期:2015-09-20 发布日期:2015-09-26
  • 作者简介:王立人(1990-),男,硕士研究生,研究方向为自然语言处理.E-mail: wlr901112@163.com
  • 基金资助:
    国家自然科学基金资助项目(61175068)

Micro-blogging topic mining based on supervised LDA user interest model

WANG Li-ren1,2, YU Zheng-tao1,2, WANG Yan-bing1,2, GAO Sheng-xiang1,2, LI Xian-hui1,2   

  1. 1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China;
    2. Intelligent Information Processing Key Laboratory, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • Received:2015-03-03 Revised:2015-07-22 Online:2015-09-20 Published:2015-09-26

摘要: 用户发布的微博内容能够体现用户兴趣,微博中用户的转发、评论、回复、他人评论等微博行为对用户兴趣具有很强的指导作用。为了有效利用用户微博行为,提出了一种基于有指导LDA(latent dirichlet allocation)的微博内容用户兴趣建模方法。首先通过分析对微博的转发、评论、回复、他人评论这4个因素对用户微博兴趣主题的影响,定义了4种约束关系;然后基于用户微博内容,将4种约束关系融合到LDA 模型中构建有指导的LDA微博主题生成模型,最后得到用户的微博主题分布,从而获得用户兴趣模型。实验结果表明,相比LDA模型,该方法的准确率有很大提高,引入4种信息对微博用户兴趣发现有非常重要的指导作用。

关键词: 兴趣挖掘, 微博行为, 微博内容, 有指导LDA

Abstract: The content of users Micro-blogging can reflect users' interests. Forwarding, commenting, replying and other behavior about Micro-blogging have a strong guiding role to discovering users' interests. In order to using Micro-blogging behavior effectively, we proposed users' interest modeling method based on supervised-LDA Micro-blogging contents. First of all,through analyzing the impact elements, including forwarding, commenting, replying, and other behavior, four constraint relations were defined. Second, based on the contents of Micro-blogging, the four constraint relations were put into the LDA model and the supervised-LDA Micro-blogging theme generation model were constructed. And then the distribution of the users' theme and the users' interests' model were obtained. The experimental results show that compared with the LDA method, this model has high accuracy, and the four introduced guiding information have a significant role in discovering Micro-blogging users' interests.

Key words: interest in mining, Micro-blogging behavior, supervised LDA, Micro-blogging content

中图分类号: 

  • TP393
[1] CLAYPOOL M, LE P, WASEDA M, et al.Implicit interest indicators[C]//Proceedings of the 6th International Conference. New York:ACM, 2001:30-40.
[2] SHEN Xuehua, TAN Bin, ZHAI Chengxiang. Implicit user modeling for personalized search[C]//Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management. New York:ACM, 2005, 10(5):5-6.
[3] 林鸿飞, 杨元生. 用户兴趣模型的表示和更新机制[J]. 计算机研究与发展, 2002, 39(7):843-847. LIN Hongfei, YANG Yuansheng. The representation and update mechanism for user profile[J]. Journal of Computer Research and Development, 2002, 39(7):843-847.
[4] WENG Jianshu, LIM E P, JIANG Jing, et al.TwitterRank:finding topic-sensitive influential twitterers[C]//Proceedings of the 3th ACM International Conference on Web Search and Data Mining.New York:ACM, 2010:261-270.
[5] 董婧灵,李芳,何婷婷,等.基于LDA模型的文本聚类研究[C]//中国计算语言学研究前沿进展,北京:清华大学出版社,2011:455-461. DONG Jingling, LI Fang, HE Tingting, et al.Document clustering method based on LDA model[C]//Advances of Computational Linguistics in China.Beijing:Tsinghua University Press, 2011:455-461.
[6] YAO Quanzhu, SONG Zhili, PENG Cheng.Research on text categorization based on LDA[J].Computer Engineering and Applications, 2011, 47(13):150-153.
[7] 张晨逸, 孙建伶, 丁轶群. 基于 MB-LDA 模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10):1795-1802. ZHANG Chenyi, SUN Jianling, DING Yiqun. Topic mining for Micro-blog based on MB-LDA model[J]. Journal of Computer Research and Development, 2011, 48(10):1795-1802.
[8] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[9] GRIFFITHS T, STEYVERS M. Finding scientific topics[C]//Proceedings of the National Academy of Sciences of the United States America. [S.l.]: [s.n.], 2004, 101:5228-5235.
[10] ROSEN-ZVI M, GRIFITHS T, STEYVERS M, et al. The author-topic model for authors and documents[C]//Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. Virginia: AUAI Press, 2004:487-494.
[11] HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22th Annual International SIGIR Conference on Research and Development in Information Retrieval(SIGIR'99). Berkeley: ACM, 1999:50-57.
[12] 刘群,李素建.基于《知网》的词汇语义相似度计算[J].中文计算语言学, 2002, 7(2):59-76. LIU Qun, LI Suqun. Word similarity computing based on How-net[J]. International Journal of Computational Linguistics & Chinese Language Processing, 2002, 7(2):59-76.
[1] 李艳平,齐艳姣,张凯,魏旭光. 支持用户撤销的多授权机构的属性加密方案[J]. 山东大学学报(理学版), 2018, 53(7): 75-84.
[2] 章广志,蔡绍斌,马春华,张东秋. 最大距离可分码在网络编码纠错中的应用[J]. 山东大学学报(理学版), 2018, 53(1): 75-82.
[3] 李阳,程雄,童言,陈伟,秦涛,张剑,徐明迪. 基于流量统计特征的潜在威胁用户挖掘方法[J]. 山东大学学报(理学版), 2018, 53(1): 83-88.
[4] 赵光远,秦丰林,郭晓东. 基于P2P的网络测量云平台的设计与实现[J]. 山东大学学报(理学版), 2017, 52(12): 104-110.
[5] 黄淑芹,徐勇,王平水. 基于概率矩阵分解的用户相似度计算方法及推荐应用[J]. 山东大学学报(理学版), 2017, 52(11): 37-43.
[6] 王亚奇,王静. 考虑好奇心理机制的动态复杂网络谣言传播研究[J]. 山东大学学报(理学版), 2017, 52(6): 99-104.
[7] 陈广瑞,陈兴蜀,王毅桐,葛龙. 一种IaaS多租户环境下虚拟机软件更新服务机制[J]. 山东大学学报(理学版), 2017, 52(3): 60-67.
[8] 庄政茂,陈兴蜀,邵国林,叶晓鸣. 一种时间相关性的异常流量检测模型[J]. 山东大学学报(理学版), 2017, 52(3): 68-73.
[9] 宋元章,李洪雨,陈媛,王俊杰. 基于分形与自适应数据融合的P2P botnet检测方法[J]. 山东大学学报(理学版), 2017, 52(3): 74-81.
[10] 祝升,周斌,朱湘. 综合用户相似性与话题时效性的影响力用户发现算法[J]. 山东大学学报(理学版), 2016, 51(9): 113-120.
[11] 岳猛,吴志军,姜军. 云计算中基于可用带宽欧氏距离的LDoS攻击检测方法[J]. 山东大学学报(理学版), 2016, 51(9): 92-100.
[12] 李宇溪,王恺璇,林慕清,周福才. 基于匿名广播加密的P2P社交网络隐私保护系统[J]. 山东大学学报(理学版), 2016, 51(9): 84-91.
[13] 苏彬庭,许力,方禾,王峰. 基于Diffie-Hellman的无线Mesh网络快速认证机制[J]. 山东大学学报(理学版), 2016, 51(9): 101-105.
[14] 林丽. 基于核心依存图的新闻事件抽取[J]. 山东大学学报(理学版), 2016, 51(9): 121-126.
[15] 刘驰,闫宏飞. 基于元信息的云盘资源检索结果去重[J]. 山东大学学报(理学版), 2016, 51(7): 11-17.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!