JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2020, Vol. 55 ›› Issue (1): 94-101.doi: 10.6040/j.issn.1671-9352.1.2018.136

•   • Previous Articles     Next Articles

Topic tag popularity prediction based on multi-dimensional features

Xin-le WANG1,2(),Wen-feng YANG3,Hua-ming LIAO1,Yong-qing WANG1,Yue LIU1,Xiao-ming YU1,Xue-qi CHENG1   

  1. 1. CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2. Graduate Universityof Chinese Academyof Sciences, Beijing 100049, China
    3. China Mobile (Suzhou) Software Technology Co. LTD, Suzhou 215000, Jiangsu, China
  • Received:2018-10-17 Online:2020-01-20 Published:2020-01-10

Abstract:

The characteristics of user network structure information and topic tags such as sentiment and regionality are analyzed. A popularity prediction model that considers the user's fan network structure characteristics and the subject tag's own characteristics is proposed. Experiments have proved that the newly proposed feature is effective and has a high reference value for predicting the popularity of future topic tags.

Key words: topic tag, popularity prediction, multiple feature

CLC Number: 

  • TP391

Fig.1

Time frequency distribution of Hashtag"

Table 1

Weibo data"

属性 属性值
文件大小/GB 130
微博消息数量 2×108
Hashtag数量 5 923 117

Table 2

Hashtag train data"

属性 属性值
文件大小/GB 1.1
训练样本数 370 633
特征类型数量 7
微博特征维度 7
Hashtag特征维度 4
时间序列特征维度 30
区地域特征维度 8
省地域特征维度 35
主题特征维度 60
用户向量表示维度 100
用户自身特征维度 2

Table 3

Userfan network structure data"

属性 属性值
文件大小/GB 5.6
具有粉丝的用户数量 167 114
用户总量 2 529 073
用户粉丝连接边的数量 5.5×108

Table 4

Comparison of SVM, RF and xgboost results"

模型 MSE MAE MSLE Time/s
SVM 4 740 092.8 6.09 1.60 55 800
随机森林 1 492 180.4 5.25 1.09 12 647
xgboost 1 228 629.5 4.86 0.97 10 237

Table 5

Behaves of each feature on xgboost"

特征 MSE MAE MSLE
all 1 228 629.50 4.86 0.97
-time 1 482 928.60 5.1 1.12
-topic 1 386 303.30 5.84 1.15
-location 1 472 190.20 5.63 1.13
-user embedding 1 503 357.10 5.38 1.11
-user feature 1 514 726.90 6.26 1.21
-weibo feature 1 475 384.30 5.22 1.09
-hashtag feature 1 509 744.40 5.22 1.1

Table 6

The change of the index after removing some features"

特征 MAE MSLE
topic 20.10 18.10
location 15.80 16.00
User embedding 10.10 14.10
Hashtag feature 7.00 13.00

Fig.2

The propagation of a message over a user's network"

Table 7

Propagating details of topic tags"

用户ID编号 消息类型 消息内容
0 原发 晚安,王嘎一王嘉尔自作曲WOLO, #王嘉尔拜托了冰箱#王嘉尔透鲜滴星期天王嘉尔Jackson
1 转发 @浅蓝之季:#王嘉尔拜托了冰箱#还有英智和bambam
2 转发 #王嘉尔拜托了冰箱#Jackson真的适合这种风格呢,特别青春洋溢另外我看到了何哥哥有爱滴小眼神,一直盯着我嘉@irene_hhy:#拜托了冰箱MC王嘉尔#
3 转发 #王嘉尔拜托了冰箱#哈哈哈,多点真诚啊首掌!@边际未明:?我学生卡都掏出来了车呢车呢?马赛克啥玩意儿?@jacky王嘉尔-是WANG:哈哈看截图真的好萌啊
4 转发 #王嘉尔拜托了冰箱#冰箱家族@supercalifragilisticexpialido7:p2—黄老师和嘉嘉的糖…第一次吃,我嘎跟个贴心小棉袄似的

Table 8

The geographical characteristics of the topic tag"

地域 具体地区 Hashtag
海外 中国以外的其他国家 韩国水晶防晒喷雾,龟岛
东北三省 黑龙江,吉林,辽宁 二人转,冰雪大世界
西北 青海,新疆,宁夏,甘肃,陕西 羊肉泡馍,青海湖旅游攻略

Table 9

Thematic features change over time"

Hashtag Day-1 Day-2 Day-3
文化新闻
脱口秀
优酷,分享,新闻,故事,沙和,文化,微笑,老师,客户,一起,看看 今波,贞观,李世民,孟姜女哭,土豆,文化,天子,新闻,故事,什么,中国,下载 今波,季-,朱由校,秘招,文化,新闻,故事,花絮,独家,明朝,皇帝,救国,个人,前世,八仙,今生,录制,爱好,中国
周杰伦中国
好声音
冯小刚,声音,大赞,领悟力,宣传片,给力,奥特曼,才情,满分,电影,幽默,合作 声音,杰伦,我的,偶像,中国,一直,我家,歌词,只有,随便,超级,长大,忘记,出来,好听 叶湘伦,忽高,颜值,只服,周杰伦歌迷,后援会,声音,音乐,杰伦,拜拜,窒息,崩溃,中国,现场,赶往,录制
家族世代 信托,视野,观点,商界,全球,中外,大佬,产品, 18年,人物,所有,结果,默默无闻,帝国 行善,莫忘,订阅,微信,简史,任正非,信托,工具,观点,斗争,对方,制度,迷茫,政经,技术 美国通用,海尔,惊天,马云,王健林,订阅,微信, 5年, 56亿美金,沉寂,筹谋,背后,故事,年前,人物

Table 10

Xgboost features rank the first 30 dimensions in importance"

Rank Feature
1 时间序列特征
2 省地域特征-第十维
3 地区地域特征-第一维
4 微博长度
5 主题向量-第一维
6 消息类型
7 主题向量-第五维
8 是否包含URL
9 评论数
10 转发数
11 省地域特征-第二十维
12 主题向量-第十维
13 地区地域特征-第五维
14 用户向量-第四维
15 粉丝数
16 主题向量-第十二维
17 主题向量-第十五维
18 地区地域特征-第七维
19 用户向量-第十维
20 地区地域特征-第二维
21 朋友数
22 微博情感性
23 省地域特征-第三维
24 地区地域特征-第六维
25 用户向量-第三十四维
26 主题向量-第十九维
27 省地域特征-第九维
28 是否包含数字
29 用户向量-第五十二维
30 用户向量-第六十四维
1 吴越, 陈晓亮, 蒋忠远. 微博信息流行度预测研究综述[J]. 西华大学学报(自然科学版), 2017, 36 (1): 1- 6.
doi: 10.3969/j.issn.1673-159X.2017.01.001
WU Yue , CHEN Xiaoliang , JIANG Zhongyuan . Survey on predicting popularity of information in microblogs[J]. Journal of Xihua University(Natural Science Edition), 2017, 36 (1): 1- 6.
doi: 10.3969/j.issn.1673-159X.2017.01.001
2 邵健, 章成志, 李蕾. Hashtag研究综述[J]. 现代图书情报技术, 2015, 31 (10): 40- 49.
doi: 10.11925/infotech.1003-3513.2015.10.06
SHAO Jian , ZHANG Chengzhi , LI Lei . Survey on Hashtag[J]. New Technology of Library and Information Service, 2015, 31 (10): 40- 49.
doi: 10.11925/infotech.1003-3513.2015.10.06
3 HUGHES A L . Twitter adoption and use in mass convergence and emergency events[J]. International Journal of Emergency Management, 2009, 6 (3/4): 248- 260.
doi: 10.1504/IJEM.2009.031564
4 DELLER R . Twittering on: audience research and participation using Twitter[J]. Participations, 2011, 8 (1): 216- 245.
5 SMALL T A . What the Hashtag? a content analysis of canadian politics on twitter[J]. Information, Communication & Society, 2011, 14 (6): 872- 895.
6 SZABO G , HUBERMAN B A . Predicting the popularity of online content[J]. Communications of the Acm, 2010, 53 (8): 80- 88.
doi: 10.1145/1787234.1787254
7 PINTO H, ALMEIDA J M, GONÇALVES M A. Using early view patterns to predict the popularity of youtube videos[C]// Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. Italy: ACM, 2013: 365-374.
8 CHENG J, ADAMIC L, DOW P A, et al. Can cascades be predicted?[C]// Proceedings of the 23rd International Conference on World Wide Web. Korea: ACM, 2014: 925-936.
9 SHULMAN B, SHARMA A, COSLEY D. Predictability of popularity: gaps between prediction and understanding[C]// Tenth International AAAI Conference on Web and Social Media. Menlo Park: AAAI, 2016.
10 BAKSHY E, HOFMAN J M, MASON W A, et al. Everyone's an influencer: quantifying influence on twitter[C]// Proceedings of the Forth International Conference on Web Search and Web Data Mining. Hong Kong: ACM, 2011.
11 ROMERO D M, TAN C, UGANDER J. On the interplay between social and topical structure[C]// Seventh International AAAI Conference on Weblogs and Social Media. Menlo Park: AAAI, 2013.
12 TSUR O, RAPPOPORT A. What's in a Hashtag? content based prediction of the spread of ideas in microblogging communities[C]// Proceedings of the Fifth International Conference on Web Search and Web Data Mining. Seattle: ACM, 2012.
13 YANG J, LESKOVEC J. Patterns of temporal variation in online media[C]// Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. Hong Kong: ACM, 2011: 177-186.
14 MATSUBARA Y, SAKURAI Y, PRAKASH B A, et al. Rise and fall patterns of information diffusion: model and implications[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing: ACM, 2012: 6-14.
15 FIGUEIREDO F , ALMEIDA J M , GONÇALVES M A , et al. Trendlearner: early prediction of popularity trends of user generated content[J]. Information Sciences, 2016, 349: 172- 187.
16 HU Y , HU C , FU S , et al. Predicting the popularity of viral topics based on time series forecasting[J]. Neurocomputing, 2016, 210: 55- 65.
doi: 10.1016/j.neucom.2015.10.143
17 MA Z , SUN A , CONG G . On predicting the popularity of newly emerging hashtags in Twitter[J]. Journal of the Association for Information Science & Technology, 2014, 64 (7): 1399- 1410.
[1] Ni LI,Huan-mei GUAN,Piao YANG,Wen-yong DONG. BERT-IDCNN-CRF for named entity recognition in Chinese [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(1): 102-109.
[2] ZHANG Di, ZHA Dong-dong, LIU Hua-yong. Construction of the cubic λμ-α-DP curve with two kinds of shape parameters [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(9): 114-126.
[3] YANG Ya-ru, WANG Yong-qing, ZHANG Zhi-bin, LIU Yue, CHENG Xue-qi. Social network user identity linkage model based on comprehensive information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(9): 105-113.
[4] Chang-ying HAO,Yan-yan LAN,Hai-nan ZHANG,Jia-feng GUO,Jun XU,Liang PANG,Xue-qi CHENG. Dialogue generation model based on extended keywords information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 68-76.
[5] Xiang-wen LIAO,Yang XU,Jing-jing WEI,Ding-da YANG,Guo-long CHEN. Review spam detection based on the two-level stacking classification model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 57-67.
[6] Yang XU,Jian-zhong SUN,Lei HUANG,Xiao-yao XIE. Trajectory model of area crowd based on WiFi positioning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(5): 8-20.
[7] Zhe-jin DONG,Jian WANG,Ling-fei QIAN,Hong-fei LIN. A modeling method of user growth profile [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 38-45.
[8] Xue-mei WANG,Xing-shu CHEN,Hai-zhou WANG,Wen-xian WANG. Automatic extraction of key information for news web pages based on tag and block features [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 67-74.
[9] Heng-ze BAO,Dong ZHOU,Tan WU. Tag recommendation with multi-source heterogeneous networked information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 56-66.
[10] Jie WU,Xiao-fei ZHU,Yi-hao ZHANG,Jian-wu LONG,Xian-ying HUANG,Wu YANG. User sentiment tendency aware based Micro-blog sentiment analysis method [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 46-55.
[11] YU Chuan-ming, ZUO Yu-heng, GUO Ya-jing, AN Lu. Dynamic discovery of authors research interest based on the combined topic evolutional model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 23-34.
[12] . Reader emotion classification with news and comments [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 35-39.
[13] GONG Shuang-shuang, CHEN Yu-feng, XU Jin-an, ZHANG Yu-jie. Extraction of Chinese multiword expressions based on Web text [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 40-48.
[14] . Design and implementation of topic detection in Russian news based on ontology [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 49-54.
[15] ZHANG Jun, LI Jing-fei, ZHANG Rui, RUAN Xing-mao, ZHANG Shuo. Community detection algorithm based on effective resistance of network [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 24-29.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHOU Juan,GUO Wei-hua,ZONG Mei-juan,HAN Xue-mei,WANG REN-qing . Analysis of the soil cultivable bacterial diversities underdifferent vegetations of Fanggan village[J]. J4, 2006, 41(6): 161 -167 .
[2] ZHANG Dong-qing, YIN Xiao-bin, GAO Han-peng. Quasi-linearly Armendariz modules[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(12): 1 -6 .
[3] ZHANG Fang-guo. Elliptic curves in cryptography: past, present and future…[J]. J4, 2013, 48(05): 1 -13 .
[4] YANG Lun, XU Zheng-gang, WANG Hui*, CHEN Qi-mei, CHEN Wei, HU Yan-xia, SHI Yuan, ZHU Hong-lei, ZENG Yong-qing*. Silence of PID1 gene expression using RNA interference in C2C12 cell line[J]. J4, 2013, 48(1): 36 -42 .
[5] LIU Yan-ping, WU Qun-ying. Almost sure limit theorems for the maximum of Gaussian sequences#br# with optimized weight[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 50 -53 .
[6] ZHANG Shen-gui. Multiplicity of solutions for local superlinear p-kirchhoff-type equation#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 61 -68 .
[7] WANG Kai-rong, GAO Pei-ting. Two mixed conjugate gradient methods based on DY[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(6): 16 -23 .
[8] LI Shou-ju1,SHANGGUAN Zi-chang2,3,SUN Wei4,LUAN Mao-tian1,LIU Bo3. Parameter  inversion  procedure  for  a  nonlinear constitutive  model  of  conditioned  soils[J]. J4, 2010, 45(7): 24 -27 .
[9] WU Dai-yong, ZHANG Hai. Stability and bifurcation analysis for a single population discrete model with Allee effect and delay[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(07): 88 -94 .
[10] MA Qiao-ling,SHAN Wei and WU Jian-liang . Edge coloring with restriction of vertices and faces on Halin graphs[J]. J4, 2007, 42(4): 24 -27 .