JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2020, Vol. 55 ›› Issue (1): 94-101.doi: 10.6040/j.issn.1671-9352.1.2018.136

•   • Previous Articles     Next Articles

Topic tag popularity prediction based on multi-dimensional features

Xin-le WANG1,2(),Wen-feng YANG3,Hua-ming LIAO1,Yong-qing WANG1,Yue LIU1,Xiao-ming YU1,Xue-qi CHENG1   

  1. 1. CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2. Graduate Universityof Chinese Academyof Sciences, Beijing 100049, China
    3. China Mobile (Suzhou) Software Technology Co. LTD, Suzhou 215000, Jiangsu, China
  • Received:2018-10-17 Online:2020-01-20 Published:2020-01-10

Abstract:

The characteristics of user network structure information and topic tags such as sentiment and regionality are analyzed. A popularity prediction model that considers the user's fan network structure characteristics and the subject tag's own characteristics is proposed. Experiments have proved that the newly proposed feature is effective and has a high reference value for predicting the popularity of future topic tags.

Key words: topic tag, popularity prediction, multiple feature

CLC Number: 

  • TP391

Fig.1

Time frequency distribution of Hashtag"

Table 1

Weibo data"

属性 属性值
文件大小/GB 130
微博消息数量 2×108
Hashtag数量 5 923 117

Table 2

Hashtag train data"

属性 属性值
文件大小/GB 1.1
训练样本数 370 633
特征类型数量 7
微博特征维度 7
Hashtag特征维度 4
时间序列特征维度 30
区地域特征维度 8
省地域特征维度 35
主题特征维度 60
用户向量表示维度 100
用户自身特征维度 2

Table 3

Userfan network structure data"

属性 属性值
文件大小/GB 5.6
具有粉丝的用户数量 167 114
用户总量 2 529 073
用户粉丝连接边的数量 5.5×108

Table 4

Comparison of SVM, RF and xgboost results"

模型 MSE MAE MSLE Time/s
SVM 4 740 092.8 6.09 1.60 55 800
随机森林 1 492 180.4 5.25 1.09 12 647
xgboost 1 228 629.5 4.86 0.97 10 237

Table 5

Behaves of each feature on xgboost"

特征 MSE MAE MSLE
all 1 228 629.50 4.86 0.97
-time 1 482 928.60 5.1 1.12
-topic 1 386 303.30 5.84 1.15
-location 1 472 190.20 5.63 1.13
-user embedding 1 503 357.10 5.38 1.11
-user feature 1 514 726.90 6.26 1.21
-weibo feature 1 475 384.30 5.22 1.09
-hashtag feature 1 509 744.40 5.22 1.1

Table 6

The change of the index after removing some features"

特征 MAE MSLE
topic 20.10 18.10
location 15.80 16.00
User embedding 10.10 14.10
Hashtag feature 7.00 13.00

Fig.2

The propagation of a message over a user's network"

Table 7

Propagating details of topic tags"

用户ID编号 消息类型 消息内容
0 原发 晚安,王嘎一王嘉尔自作曲WOLO, #王嘉尔拜托了冰箱#王嘉尔透鲜滴星期天王嘉尔Jackson
1 转发 @浅蓝之季:#王嘉尔拜托了冰箱#还有英智和bambam
2 转发 #王嘉尔拜托了冰箱#Jackson真的适合这种风格呢,特别青春洋溢另外我看到了何哥哥有爱滴小眼神,一直盯着我嘉@irene_hhy:#拜托了冰箱MC王嘉尔#
3 转发 #王嘉尔拜托了冰箱#哈哈哈,多点真诚啊首掌!@边际未明:?我学生卡都掏出来了车呢车呢?马赛克啥玩意儿?@jacky王嘉尔-是WANG:哈哈看截图真的好萌啊
4 转发 #王嘉尔拜托了冰箱#冰箱家族@supercalifragilisticexpialido7:p2—黄老师和嘉嘉的糖…第一次吃,我嘎跟个贴心小棉袄似的

Table 8

The geographical characteristics of the topic tag"

地域 具体地区 Hashtag
海外 中国以外的其他国家 韩国水晶防晒喷雾,龟岛
东北三省 黑龙江,吉林,辽宁 二人转,冰雪大世界
西北 青海,新疆,宁夏,甘肃,陕西 羊肉泡馍,青海湖旅游攻略

Table 9

Thematic features change over time"

Hashtag Day-1 Day-2 Day-3
文化新闻
脱口秀
优酷,分享,新闻,故事,沙和,文化,微笑,老师,客户,一起,看看 今波,贞观,李世民,孟姜女哭,土豆,文化,天子,新闻,故事,什么,中国,下载 今波,季-,朱由校,秘招,文化,新闻,故事,花絮,独家,明朝,皇帝,救国,个人,前世,八仙,今生,录制,爱好,中国
周杰伦中国
好声音
冯小刚,声音,大赞,领悟力,宣传片,给力,奥特曼,才情,满分,电影,幽默,合作 声音,杰伦,我的,偶像,中国,一直,我家,歌词,只有,随便,超级,长大,忘记,出来,好听 叶湘伦,忽高,颜值,只服,周杰伦歌迷,后援会,声音,音乐,杰伦,拜拜,窒息,崩溃,中国,现场,赶往,录制
家族世代 信托,视野,观点,商界,全球,中外,大佬,产品, 18年,人物,所有,结果,默默无闻,帝国 行善,莫忘,订阅,微信,简史,任正非,信托,工具,观点,斗争,对方,制度,迷茫,政经,技术 美国通用,海尔,惊天,马云,王健林,订阅,微信, 5年, 56亿美金,沉寂,筹谋,背后,故事,年前,人物

Table 10

Xgboost features rank the first 30 dimensions in importance"

Rank Feature
1 时间序列特征
2 省地域特征-第十维
3 地区地域特征-第一维
4 微博长度
5 主题向量-第一维
6 消息类型
7 主题向量-第五维
8 是否包含URL
9 评论数
10 转发数
11 省地域特征-第二十维
12 主题向量-第十维
13 地区地域特征-第五维
14 用户向量-第四维
15 粉丝数
16 主题向量-第十二维
17 主题向量-第十五维
18 地区地域特征-第七维
19 用户向量-第十维
20 地区地域特征-第二维
21 朋友数
22 微博情感性
23 省地域特征-第三维
24 地区地域特征-第六维
25 用户向量-第三十四维
26 主题向量-第十九维
27 省地域特征-第九维
28 是否包含数字
29 用户向量-第五十二维
30 用户向量-第六十四维
1 吴越, 陈晓亮, 蒋忠远. 微博信息流行度预测研究综述[J]. 西华大学学报(自然科学版), 2017, 36 (1): 1- 6.
doi: 10.3969/j.issn.1673-159X.2017.01.001
WU Yue , CHEN Xiaoliang , JIANG Zhongyuan . Survey on predicting popularity of information in microblogs[J]. Journal of Xihua University(Natural Science Edition), 2017, 36 (1): 1- 6.
doi: 10.3969/j.issn.1673-159X.2017.01.001
2 邵健, 章成志, 李蕾. Hashtag研究综述[J]. 现代图书情报技术, 2015, 31 (10): 40- 49.
doi: 10.11925/infotech.1003-3513.2015.10.06
SHAO Jian , ZHANG Chengzhi , LI Lei . Survey on Hashtag[J]. New Technology of Library and Information Service, 2015, 31 (10): 40- 49.
doi: 10.11925/infotech.1003-3513.2015.10.06
3 HUGHES A L . Twitter adoption and use in mass convergence and emergency events[J]. International Journal of Emergency Management, 2009, 6 (3/4): 248- 260.
doi: 10.1504/IJEM.2009.031564
4 DELLER R . Twittering on: audience research and participation using Twitter[J]. Participations, 2011, 8 (1): 216- 245.
5 SMALL T A . What the Hashtag? a content analysis of canadian politics on twitter[J]. Information, Communication & Society, 2011, 14 (6): 872- 895.
6 SZABO G , HUBERMAN B A . Predicting the popularity of online content[J]. Communications of the Acm, 2010, 53 (8): 80- 88.
doi: 10.1145/1787234.1787254
7 PINTO H, ALMEIDA J M, GONÇALVES M A. Using early view patterns to predict the popularity of youtube videos[C]// Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. Italy: ACM, 2013: 365-374.
8 CHENG J, ADAMIC L, DOW P A, et al. Can cascades be predicted?[C]// Proceedings of the 23rd International Conference on World Wide Web. Korea: ACM, 2014: 925-936.
9 SHULMAN B, SHARMA A, COSLEY D. Predictability of popularity: gaps between prediction and understanding[C]// Tenth International AAAI Conference on Web and Social Media. Menlo Park: AAAI, 2016.
10 BAKSHY E, HOFMAN J M, MASON W A, et al. Everyone's an influencer: quantifying influence on twitter[C]// Proceedings of the Forth International Conference on Web Search and Web Data Mining. Hong Kong: ACM, 2011.
11 ROMERO D M, TAN C, UGANDER J. On the interplay between social and topical structure[C]// Seventh International AAAI Conference on Weblogs and Social Media. Menlo Park: AAAI, 2013.
12 TSUR O, RAPPOPORT A. What's in a Hashtag? content based prediction of the spread of ideas in microblogging communities[C]// Proceedings of the Fifth International Conference on Web Search and Web Data Mining. Seattle: ACM, 2012.
13 YANG J, LESKOVEC J. Patterns of temporal variation in online media[C]// Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. Hong Kong: ACM, 2011: 177-186.
14 MATSUBARA Y, SAKURAI Y, PRAKASH B A, et al. Rise and fall patterns of information diffusion: model and implications[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing: ACM, 2012: 6-14.
15 FIGUEIREDO F , ALMEIDA J M , GONÇALVES M A , et al. Trendlearner: early prediction of popularity trends of user generated content[J]. Information Sciences, 2016, 349: 172- 187.
16 HU Y , HU C , FU S , et al. Predicting the popularity of viral topics based on time series forecasting[J]. Neurocomputing, 2016, 210: 55- 65.
doi: 10.1016/j.neucom.2015.10.143
17 MA Z , SUN A , CONG G . On predicting the popularity of newly emerging hashtags in Twitter[J]. Journal of the Association for Information Science & Technology, 2014, 64 (7): 1399- 1410.
[1] ZHANG Xiaoyuan, TIAN Yi, REN Zihan, DUAN Tianyu, YANG Siyuan, ZHANG Yuexuan. Application of topology neighborhood bases in density clustering algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(5): 55-64.
[2] . Based on multi-scale feature fusion and improved attention for rusty bolt and nut detection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 1-14.
[3] ZHONG Shang, MA Li, LIU Wenzhe, LI Yuhao. Lightweight water surface small object detection model with multi-scale attention mechanism and improved feature fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 15-25.
[4] YU Lei, SUN Yi, HUA Jinming, LI Laquan. Analysis of the prediction model based on deep neural networks for mortality risk prediction for sepsis patients in intensive care units [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 26-35.
[5] . Fuzzy mathematical morphology edge detection method derived from general overlap functions [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 36-48.
[6] . Fuzzy rough c-means based on the knowledge measure [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 49-64.
[7] SUN Qing, YE Jun, ZENG Guangcai, SONG Suyang, WANG Yixin. Three-way K-means algorithm combining the bat algorithm and the improved compactness [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 65-75.
[8] ZOU Zheng, LEI Yusheng, LIU Shijian, WANG Dingyi, QIU Xuewei, SHI Wenwen, ZHOU Xiaotong. Precise morphological recognition with zonal micro-direction for termites [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 76-84.
[9] Xia LIANG,Jie GUO. A method of online teaching platform selection based on online reviews [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(9): 108-118.
[10] Chao LI,Wei LIAO. Chinese disease text classification model driven by medical knowledge [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 122-130.
[11] Jie JI,Chengjie SUN,Lili SHAN,Boyue SHANG,Lei LIN. A prompt learning approach for telecom network fraud case classification [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 113-121.
[12] Qi LUO,Gang GOU. Multimodal conversation emotion recognition based on clustering and group normalization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 105-112.
[13] Fengxu ZHAO,Jian WANG,Yuan LIN,Hongfei LIN. Probability distribution optimization model for learning to rank [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 95-104.
[14] Xingyu HUANG,Mingyu ZHAO,Ziyu LYU. Category-wise knowledge probers for representation learning of graph neural networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 85-94.
[15] Liang GUI,Yao XU,Shizhu HE,Yuanzhe ZHANG,Kang LIU,Jun ZHAO. Factual error detection in knowledge graphs based on dynamic neighbor selection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 76-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!