JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2019, Vol. 54 ›› Issue (3): 38-45.doi: 10.6040/j.issn.1671-9352.1.2018.149

•   • Previous Articles     Next Articles

A modeling method of user growth profile

Zhe-jin DONG(),Jian WANG*(),Ling-fei QIAN,Hong-fei LIN   

  1. Institute of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2018-10-17 Online:2019-03-01 Published:2019-03-19
  • Contact: Jian WANG E-mail:zd2221@columbia.edu.cn;wangjian@dlut.edu.cn
  • Supported by:
    国家重点研发计划项目资助(2016YFB1001103)

Abstract:

User growth value reflects users stickiness, and growth value prediction is important to accurate marketing. This paper focuses on the study of users growth portraits. For problems, disorganized raw data and unpredictable user features, this paper applies scatter diagram analysis to extract behavior features and stable time features influencing the users growth value, and compares two feature selection theories, Tree-based and L1 norm to recognize key features. For the issue of insufficient labeled training dataset, this paper improved the COREG algorithm, enriching labeled dataset through semi-supervised regression, promoting the prediction accuracy, and reducing the algorithms time complexity. Finally, this paper utilizes Stacking method to integrate different models advantages. Experiments based on the data from SMP CUP 2017, provided by the CSDN blog platform, show that the methods proposed in this paper effectively enhances models generalization ability and prediction accuracy.

Key words: user growth value, user profile, feature extraction, semi-supervised regression, ensemble method

CLC Number: 

  • TP391

Fig.1

Framework of user growth prediction"

Fig.2

Algorithm of semi-supervised model"

Fig.3

Framework of model fusion"

Table 1

User data"

数据类别 数据内容 数据量
用户内容数据 用户发表的博客 1, 000, 000篇文档
用户行为数据 用户发表博客行为 1, 000, 000条记录
用户浏览博客行为 3, 536, 444条记录
用户评论博客行为 182, 273条记录
用户对博客点赞行为 95, 668条记录
用户对博客点踩行为 9, 326条记录
用户收藏博客行为 10, 4723条记录
社交关系数据 用户之间关注关系 667, 037条记录
用户之间私信关系 46, 572条记录
成长值 2016年用户的成长值 1015条记录

Fig.4

User behavior data record"

Table 2

Prediction accuracy of models"

特征 RF SVM kNN ETR GBT
WB 0.743 0.721 0.745 0.751 0.793
L1B 0.570 0.530 0.580 0.567 0.638
TreeB 0.753 0.730 0.754 0.753 0.793
WB+WT 0.770 0.755 0.747 0.761 0.779
TreeB+WT 0.758 0.550 0.533 0.754 0.770
TreeB+TreeT 0.770 0.779 0.756 0.756 0.787
TreeB+TreeT+FkNN 0.767 0.755 0.762 0.761 0.781
TreeB+TreeT+FSVM 0.777 0.779 0.777 0.777 0.786
Stacking+FSVM 0.800

Fig.5

Scatter distribution diagram of login time and growth value"

Fig.6

Scatter plot of active months and growth values"

Fig.7

Scatter number of browsing times and growth values"

Fig.8

Scatter number of browsing times and growth values of samples added by SVM"

Table 3

Prediction scores corresponding to semi-supervised(n=1000)"

K RF SVM kNN ETR GBT
0 0.771 0.776 0.766 0.772 0.782
1 0.771 0.776 0.766 0.773 0.782
2 0.765 0.777 0.766 0.766 0.780
3 0.765 0.777 0.766 0.766 0.780
4 0.775 0.774 0.768 0.775 0.786
5 0.774 0.776 0.772 0.772 0.784
6 0.775 0.776 0.771 0.778 0.78
7 0.777 0.779 0.774 0.779 0.786
8 0.775 0.779 0.772 0.777 0.786
9 0.775 0.779 0.771 0.779 0.782
10 0.776 0.779 0.773 0.777 0.782

Fig.9

k value and accuracy distribution trend chart"

1 CHA M, HADDADI H, BENEVENUTO F, et al. Measuring user influence in twitter: the million follower fallacy[C]// International Conference on Weblogs and Social Media. Washington: ICWSM, 2010.
2 RÄBIGER S , SPILIOPOULOU M . A framework for validating the merit of properties that predict the influence of a twitter user[J]. Expert Systems with Applications, 2015, 42 (5): 2824- 2834.
doi: 10.1016/j.eswa.2014.11.006
3 陈姝, 窦永香, 张青杰. 基于理性行为理论的微博用户转发行为影响因素研究[J]. 情报杂志, 2017, 36 (11): 147- 152, 160.
doi: 10.3969/j.issn.1002-1965.2017.11.023
CHEN Shu , DOU Yongxiang , ZHANG Qingjie . Research on the influential factors of the reposting behavior of microblog users based on the theory of reasoned action[J]. Journal of Information, 2017, 36 (11): 147- 152, 160.
doi: 10.3969/j.issn.1002-1965.2017.11.023
4 SUN Q D , WANG N , ZHOU Y D , et al. Identification of influential online social network users based on multi-features[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2016, 30 (6): 1659015.
doi: 10.1142/S0218001416590151
5 ZHOU Zhihua, LI Ming. Semi-supervised regression with co-training[C]// International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc, 2005: 908-913.
6 陈梦秋, 周安民. 基于SVM的新浪热门微博预测[J]. 现代计算机, 2017, (9): 23- 27.
doi: 10.3969/j.issn.1007-1423.2017.09.006
CHEN Mengqiu , ZHOU Anmin . Sina popular microblog prediction based on SVM[J]. Modern Computer, 2017, (9): 23- 27.
doi: 10.3969/j.issn.1007-1423.2017.09.006
7 TZANIS G , BERBERIDIS C , VLAHAVAS I . StackTIS: a stacked generalization approach for effective prediction of translation initiation sites[J]. Computers in Biology and Medicine, 2012, 42 (1): 61- 69.
doi: 10.1016/j.compbiomed.2011.10.009
8 赵青, 薛君. 网络用户粘性行为测评研究[J]. 统计与信息论坛, 2014, 29 (10): 72- 78.
doi: 10.3969/j.issn.1007-3116.2014.10.013
ZHAO Qing , XUE Jun . The evaluation study on the online stickiness behavior of internet user[J]. Statistics & Information Tribune, 2014, 29 (10): 72- 78.
doi: 10.3969/j.issn.1007-3116.2014.10.013
9 MALDONADO S , PÉREZ J , BRAVO C . Cost-based feature selection for support vector machines: an application in credit scoring[J]. European Journal of Operational Research, 2017, 261 (2): 656- 665.
doi: 10.1016/j.ejor.2017.02.037
10 刘建伟, 刘媛, 罗雄麟. 半监督学习方法[J]. 计算机学报, 2015, 38 (8): 1592- 1617.
LIU Jianwei , LIU Yuan , LUO Xionglin . Semi-supervised learning methods[J]. Chinese Journal of Computers, 2015, 38 (8): 1592- 1617.
11 REZWANUL M , ALI A , RAHMAN A . Sentiment analysis on twitter data using KNN and SVM[J]. International Journal of Advanced Computer Science and Applications, 2017, 8 (6): 19- 25.
[1] CHEN Xin, XUE Yun, LU Xin, LI Wan-li, ZHAO Hong-ya, HU Xiao-hui. Text feature extraction method for sentiment analysis based on order-preserving submatrix and frequent sequential pattern mining [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 36-45.
[2] SUN Jian-dong, GU Xiu-sen, LI Yan, XU Wei-ran. Chinese entity relation extraction algorithms based on COAE2016 datasets [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 7-12.
[3] SHI Han-xiao, LI Xiao-jun, HAO Teng-da, LIU Hong, ZHU Liu-qing. Emotion analysis on Microblog short text [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(7): 80-90.
[4] XU Ye, XU Wei-ran. Algorithm of knowledge base cumulative citation recommendation based on semantic features expansion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 26-32.
[5] WANG Hui, CHEN Guang. Feature extraction method based on Bootstrapping in English product comment [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(12): 23-29.
[6] ZHENG Jian-xing, ZHANG Bo-feng*, YUE Xiao-dong, CHENG Ze-yu. Research on themes recommendation in microblogging
scenario based on neighbor-user profile
[J]. J4, 2013, 48(11): 59-65.
[7] LIU Jian1, YIN Chun-xia 2*, YUAN Fu-yong3. A collaborative filtering recommendation mechanism based on user profile in unstructured P2P networks [J]. J4, 2011, 46(5): 28-33.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHAO Tong-xin1, LIU Lin-de1*, ZHANG Li1, PAN Cheng-chen2, JIA Xing-jun1. Pollinators and pollen polymorphism of  Wisteria sinensis (Sims) Sweet[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 1 -5 .
[2] GUO Lan-lan1,2, GENG Jie1, SHI Shuo1,3, YUAN Fei1, LEI Li1, DU Guang-sheng1*. Computing research of the water hammer pressure in the process of #br# the variable speed closure of valve based on UDF method[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 27 -30 .
[3] LI Min1,2, LI Qi-qiang1. Observer-based sliding mode control of uncertain singular time-delay systems#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 37 -42 .
[4] ZHOU Wei-na, ZUO Lian-cui*. A(d,1)-total labeling of Cartesian products of some classes of graphs#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(04): 24 -28 .
[5] HAN Ya-fei, YI Wen-hui, WANG Wen-bo, WANG Yan-ping, WANG Hua-tian*. Soil bacteria diversity in continuous cropping poplar plantation#br# by high throughput sequencing[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 1 -6 .
[6] MA Yuan-yuan, MENG Hui-li, XU Jiu-cheng, ZHU Ma. Normal distribution of lattice close-degree based on granular computing[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(08): 107 -110 .
[7] XU Jun-feng. On the growth of the meromorphic solutions of complex algebraic differential equations[J]. J4, 2010, 45(6): 91 -93 .
[8] DING Chao1, 2, YUAN Chang-an1, 3, QIN Xiao1, 3. A prediction algorithm for multi-data streams  based on GEP[J]. J4, 2010, 45(7): 50 -54 .
[9] WU Zhi-jun,SHEN Dan-dan. Architecture and key technologies of network-enabled next generation global flight tracking based on information integration and sharing[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 1 -6 .
[10] QU Xiao-ying ,ZHAO Jing . Solution of the Klein-Gordon equation for the time-dependent potential[J]. J4, 2007, 42(7): 22 -26 .