《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (3): 38-45.doi: 10.6040/j.issn.1671-9352.1.2018.149
Zhe-jin DONG(),Jian WANG*(),Ling-fei QIAN,Hong-fei LIN
摘要:
用户成长值反映用户粘性,预测用户成长值有助于实现精准营销。聚焦用户成长性画像研究,针对用户原始数据记录复杂多样、难以提取有效特征的问题,通过散点图分析挖掘影响用户成长值的因素,提取行为特征和相对稳定的时间特征,并对比基于树的特征筛选算法和L1范数进行特征筛选。针对已标注成长值的用户数据不足问题,改进COREG算法,通过半监督学习模型丰富训练数据,提高模型的预测准确度,同时降低原算法的时间复杂度,最后采用模型融合整合不同模型的优势。在CSDN博客平台提供的SMP CUP 2017数据集上进行实验,结果表明,建立的模型有效地提高了泛化能力和预测准确度。
中图分类号:
1 | CHA M, HADDADI H, BENEVENUTO F, et al. Measuring user influence in twitter: the million follower fallacy[C]// International Conference on Weblogs and Social Media. Washington: ICWSM, 2010. |
2 |
RÄBIGER S , SPILIOPOULOU M . A framework for validating the merit of properties that predict the influence of a twitter user[J]. Expert Systems with Applications, 2015, 42 (5): 2824- 2834.
doi: 10.1016/j.eswa.2014.11.006 |
3 |
陈姝, 窦永香, 张青杰. 基于理性行为理论的微博用户转发行为影响因素研究[J]. 情报杂志, 2017, 36 (11): 147- 152, 160.
doi: 10.3969/j.issn.1002-1965.2017.11.023 |
CHEN Shu , DOU Yongxiang , ZHANG Qingjie . Research on the influential factors of the reposting behavior of microblog users based on the theory of reasoned action[J]. Journal of Information, 2017, 36 (11): 147- 152, 160.
doi: 10.3969/j.issn.1002-1965.2017.11.023 |
|
4 |
SUN Q D , WANG N , ZHOU Y D , et al. Identification of influential online social network users based on multi-features[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2016, 30 (6): 1659015.
doi: 10.1142/S0218001416590151 |
5 | ZHOU Zhihua, LI Ming. Semi-supervised regression with co-training[C]// International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc, 2005: 908-913. |
6 |
陈梦秋, 周安民. 基于SVM的新浪热门微博预测[J]. 现代计算机, 2017, (9): 23- 27.
doi: 10.3969/j.issn.1007-1423.2017.09.006 |
CHEN Mengqiu , ZHOU Anmin . Sina popular microblog prediction based on SVM[J]. Modern Computer, 2017, (9): 23- 27.
doi: 10.3969/j.issn.1007-1423.2017.09.006 |
|
7 |
TZANIS G , BERBERIDIS C , VLAHAVAS I . StackTIS: a stacked generalization approach for effective prediction of translation initiation sites[J]. Computers in Biology and Medicine, 2012, 42 (1): 61- 69.
doi: 10.1016/j.compbiomed.2011.10.009 |
8 |
赵青, 薛君. 网络用户粘性行为测评研究[J]. 统计与信息论坛, 2014, 29 (10): 72- 78.
doi: 10.3969/j.issn.1007-3116.2014.10.013 |
ZHAO Qing , XUE Jun . The evaluation study on the online stickiness behavior of internet user[J]. Statistics & Information Tribune, 2014, 29 (10): 72- 78.
doi: 10.3969/j.issn.1007-3116.2014.10.013 |
|
9 |
MALDONADO S , PÉREZ J , BRAVO C . Cost-based feature selection for support vector machines: an application in credit scoring[J]. European Journal of Operational Research, 2017, 261 (2): 656- 665.
doi: 10.1016/j.ejor.2017.02.037 |
10 | 刘建伟, 刘媛, 罗雄麟. 半监督学习方法[J]. 计算机学报, 2015, 38 (8): 1592- 1617. |
LIU Jianwei , LIU Yuan , LUO Xionglin . Semi-supervised learning methods[J]. Chinese Journal of Computers, 2015, 38 (8): 1592- 1617. | |
11 | REZWANUL M , ALI A , RAHMAN A . Sentiment analysis on twitter data using KNN and SVM[J]. International Journal of Advanced Computer Science and Applications, 2017, 8 (6): 19- 25. |
[1] | 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45. |
[2] | 徐也,徐蔚然. 基于语义特征扩展的知识库增量引文推荐算法[J]. 山东大学学报(理学版), 2016, 51(11): 26-32. |
|