您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2014, Vol. 49 ›› Issue (11): 31-36.doi: 10.6040/j.issn.1671-9352.3.2014.305

• 论文 • 上一篇    下一篇

微博转发者的个性化排序

匡冲, 刘知远, 孙茂松   

  1. 智能技术与系统国家重点实验室; 清华信息科学与技术国家实验室(筹); 清华大学计算机系, 北京 100084
  • 收稿日期:2014-08-28 修回日期:2014-10-24 出版日期:2014-11-20 发布日期:2014-11-25
  • 作者简介:匡冲(1989- ),男,硕士研究生,研究方向为自然语言处理和社会计算. E-mail:kuangchong07@gmail.com
  • 基金资助:
    国家自然科学基金资助项目(61170196,61202140)

Personalized ranking of Micro-blogging forwarders

KUANG Chong, LIU Zhi-yuan, SUN Mao-song   

  1. State Key Laboratory of Intelligent Technology and Systems; Tsinghua National Laboratory for Information Science and Technology; Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Received:2014-08-28 Revised:2014-10-24 Online:2014-11-20 Published:2014-11-25

摘要: 转发行为是微博平台上信息传播的主要形式.目前已有的工作大多数聚焦在转发行为的分析和预测.针对给定的一条微博时如何找到其转发者这个问题并没有得到很好的解决.结合贝叶斯个性化排序优化标准(BPR-OPT)和分解机(FM),提出了一种通用的方法用于对微博转发者进行预测,并进一步对影响用户成为转发者的特征因素进行了细致分析,然后根据这些特征,在大规模真实数据集上对微博转发者进行了预测.实验证明该方法能够明显提高预测效果,同时也验证了基于pair-wise和特征相关的方法能更有效解决微博转发者预测问题.

关键词: 微博, 转发, 个性化排序

Abstract: The repost action is the main way for information spreading in Micro-blogging platform. Nowadays, many works have been done focusing on the repost behaviors' analysis and prediction. However, the problem about how to find the users who are the most likely to repost a given Micro-blog remains unsolved. In this paper, a general predictor, which combines Bayesian Personalized Ranking optimization criterion with Factorization Machines was presented to predict the reposter of a microblog. Furthermore, factors which affect a user to be a reposter were analyzed in details. With these facts, prediction of the reposters over large-scale real datasets was conducted. The experiment proves that this method can improve the effect of the prediction obviously. Meanwhile, method based on pair-wise and feature-related can solve the prediction problem more efficiently.

Key words: Micro-blog, personalized ranking, repost

中图分类号: 

  • TP391
[1] SUH B, HONG L, PIROLLI P, et al. Want to be retweeted? large scale analytics on factors impacting retweet in Twitter network[C]// Proceedings of IEEE 2nd International Conference on Social Computing (Socialcom). Washington: IEEE Computer Society, 2010: 177-184.
[2] HONG Liangjie, DAN O, DAVISON B D. Predicting popular messages in Twitter[C]// Proceedings of the 20th International Conference Companion on World Wide Web. New York: ACM, 2011: 57-58.
[3] FENG Wei, WANG Jianyong.Retweet or not? personalized tweet re-ranking[C]// Proceedings of the 6th ACM International Conference on Web Search and Data Mining. New York: ACM, 2013: 577-586.
[4] HONG Liangjie, DOUMITH A S, DAVISON B D. Co-factorization machines: modeling user interests and predicting individual decisions in Twitter[C]// Proceedings of the 6th ACM International Conference on Web Search And Data Mining. New York: ACM, 2013:557-566.
[5] RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback[C]// Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. [S.l.]: AUAI, 2009: 452-461.
[6] LUO Zhunchen, OSBORNE M, TANG Jintao, et al. Who will retweet me?: finding retweeters in Twitter[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 869-872.
[7] RENDLE S. Factorization machines[C]// Proceedings of the 10th IEEE International Conference on Data Mining (ICDM2010). Los Alamitos: IEEE Computer Society, 2010:995-1000.
[8] RENDLE S. Factorization machines with libFM[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2012, 3(3):57.1-57.22.
[9] BLEI D M, NG A Y, JORDAN M I, et al. Latent dirichlet allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
[10] WANG Yi, BAI Hongjie, STANTON M, et al. PLDA: parallel latent dirichlet allocation for large-scale applications[C]// Proceedings of Algorithmic Applications in Management(AAIM). Berlin, Heidelberg: Springer, 2009: 301-314.
[11] FAN R E, CHANG K W, HSIEH C J, et al. LIBLINEAR: a library for large linear classification[J]. The Journal of Machine Learning Research, 2008(9):1871-1874.
[12] JOACHIMS T. Training linear SVMs in linear time[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'06). New York: ACM, 2006: 217-226.
[1] 张聪,裴家欢,黄锴宇,黄德根,殷章志. 基于语义图优化算法的中文微博观点摘要研究[J]. 山东大学学报(理学版), 2017, 52(7): 59-65.
[2] 张中军,张文娟,于来行,李润川. 基于网络距离和内容相似度的微博社交网络社区划分方法[J]. 山东大学学报(理学版), 2017, 52(7): 97-103.
[3] 胡默之,姚天昉. 中文微博观点句识别及评价对象抽取方法[J]. 山东大学学报(理学版), 2016, 51(7): 81-89.
[4] 孙赫,李淑琴,吕学强,刘克会. 微博城市投诉文本中的地理位置实体识别[J]. 山东大学学报(理学版), 2016, 51(3): 77-85.
[5] 朱梦珺,蒋洪迅,许伟. 基于金融微博情感与传播效果的股票价格预测[J]. 山东大学学报(理学版), 2016, 51(11): 13-25.
[6] 何炎祥, 刘健博, 孙松涛, 文卫东. 基于层叠条件随机场的微博商品评论情感分类[J]. 山东大学学报(理学版), 2015, 50(11): 67-73.
[7] 王立人, 余正涛, 王炎冰, 高盛祥, 李贤慧. 基于有指导LDA用户兴趣模型的微博主题挖掘[J]. 山东大学学报(理学版), 2015, 50(09): 36-41.
[8] 昝红英, 吴泳钢, 贾玉祥, 牛桂玲. 基于多源知识的中文微博命名实体链接[J]. 山东大学学报(理学版), 2015, 50(07): 9-16.
[9] 周超, 严馨, 余正涛, 洪旭东, 线岩团. 融合词频特性及邻接变化数的微博新词识别[J]. 山东大学学报(理学版), 2015, 50(03): 6-10.
[10] 唐波, 陈光, 王星雅, 王非, 陈小慧. 微博新词发现及情感倾向判断分析[J]. 山东大学学报(理学版), 2015, 50(01): 20-25.
[11] 刘培玉, 张艳辉, 朱振方, 荀静. 融合表情符号的微博文本倾向性分析[J]. 山东大学学报(理学版), 2014, 49(11): 8-13.
[12] 杨佳能, 阳爱民, 周咏梅. 基于语义分析的中文微博情感分类方法[J]. 山东大学学报(理学版), 2014, 49(11): 14-21.
[13] 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50.
[14] 田海龙, 朱艳辉, 梁韬, 马进, 刘璟. 基于三支决策的中文微博观点句识别研究[J]. 山东大学学报(理学版), 2014, 49(08): 58-65.
[15] 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!