《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (7): 57-67.doi: 10.6040/j.issn.1671-9352.1.2018.077
廖祥文1,2,3,*(),徐阳1,2,3,魏晶晶4,杨定达1,2,3,陈国龙1,2,3
Xiang-wen LIAO1,2,3,*(),Yang XU1,2,3,Jing-jing WEI4,Ding-da YANG1,2,3,Guo-long CHEN1,2,3
摘要:
对于水军评论检测问题,已有方法在提取用户行为关系以及通过神经网络提取特征时复杂度过大,同时由于网络评论属于短文本类,其书写的不规范会导致训练过程中文本特征提取困难;另外,已有方法对数据集不平衡分布情况考虑不足。为此,提出了一种基于双层堆叠分类模型的水军评论检测方法。首先通过三元组形式构造矩阵表示用户间关系,并通过主成分分析得到低维用户关系表示,以此刻画用户在评论数据中的行为差异并且降低计算的复杂度;然后,通过评论的段落向量表示以及计算离散型特征(包括文本相似度、信息熵等)解决文本特征难以提取的问题;最后将三者相联结作为融合文本与行为特征的整体特征表示。利用集成学习的方法构造双层堆叠分类模型对评论分类,以提升模型在非平衡数据集下的检测性能。实验采用Yelp2013评论数据集,结果表明,与目前最好的基准方法对比, F1值提高了1.7%~5.2%,在非平衡数据集中提升尤为明显。
中图分类号:
1 | OTT M, CHOI Y, CARDIE C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]// Proceedings of the Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACM, 2011: 309-319. |
2 | KIM S, CHANG H, LEE S, et al. Deep semantic frame-based deceptive opinion spam analysis[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. New York: ACM, 2015: 1131-1140. |
3 | KO M C, CHEN H H. Analysis of cyber army's behaviours on web forum for elect campaign[C]// Proceedings of the Asia Information Retrieval Symposium. Switzerland: Springer, Cham, 2015: 394-399. |
4 | LI Huayi, FEI Geli, SHAO Weixiang, et al. Bimodal distribution and co-bursting in review spam detection[C]// Proceedings of the International Conference on World Wide Web. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee, 2017: 1063-1072. |
5 | REN Yafeng, ZHANG Yue. Deceptive opinion spam detection using neural network[C]// Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. Osaka: The COLING 2016 Organizing Committee, 2016: 140-150. |
6 | WANG Xuepeng, LIU Kang, ZHAO Jun. Handling cold-start problem in review spam detection by jointly embedding texts and behaviors[C]// Proceedings of the Meeting of the Association for Computational Linguistics. Vancouver: ACM, 2017: 366-376. |
7 | KIM Y. Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: EMNLP, 2014: 1746-1751. |
8 | SANTOSH K C, MAITY S K, MUKHERJEE A. ENWalk: learning network features for spam detection in twitter[C]// Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Switzerland; Springer, Cham, 2017: 90-101. |
9 | RAYANA S, AKOGLU L. Collective opinion spam detection: bridging review networks and metadata[C]// Proceedings of the 21th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2015: 985-994. |
10 | WANG Xuepeng, LIU Kang, HE Shizhu, et al. Learning to represent review with tensor decomposition for spam detection[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Austin: EMNLP, 2016: 866-875. |
11 |
WANG Yalin , SUN Kenan , YUAN Xiaofeng , et al. A novel sliding window PCA-IPF based steady-state detection framework and its industrial application[J]. IEEE Access, 2018, 6: 20995- 21004.
doi: 10.1109/ACCESS.2018.2825451 |
12 | LE Q, MIKOLOV T.Distributed representations of sentences and documents[C]// Proceedings of the International Conference on Machine Learning. Beijing: JMLR, 2014: 1188-1196. |
13 | CHEN Yijun, MAN Leungwong.Optimizing stacking ensemble by an ant colony optimization approach[C]// Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation. New York: ACM, 2011: 7-8. |
14 | SANTOSH K C, ARJUN Mukherjee. On the temporal dynamics of opinion spamming: case studies on yelp[C]// Proceedings of the 25th International Conference on World Wide Web. Republic and Canton of Geneva, Switzerland: WWW, 2016: 369-379. |
15 | MUKHERJEE A, VENKATARAMAN V, LIU B, et al. What yelp fake review filter might be doing[C]// Proceedings of the International AAAI Conference on Web and Social Media. Menlo Park: AAAI, 2013: 409-418. |
16 | HAI Zeng, ZHAO Peilin, CHENG Peng, et al. Deceptive review spam detection via exploiting task relatedness and unlabeled data[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Austin: EMNLP, 2016: 1817-1826. |
17 | FAKHRAEI S, SHASHANKA M. Collective spammer detection in evolving multi-relational social networks[C]// Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2015: 1769-1778. |
[1] | 李润川,昝红英,申圣亚,毕银龙,张中军. 基于多特征融合的垃圾短信识别[J]. 山东大学学报(理学版), 2017, 52(7): 73-79. |
[2] | 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58. |
[3] | 邵伟1,祝丽萍2,刘福国2,王秋平2. 对称阵稀疏主成分分析及其在充分降维问题中的应用[J]. J4, 2012, 47(4): 116-120. |
[4] | 周娟1,王仁卿2,郭卫华2*,王强2,王炜2,庞绪贵3,战金成3,代杰瑞3,周广军4. 鱼台优质稻生产基地土壤地球化学元素调查[J]. J4, 2012, 47(3): 5-9. |
[5] | 王德良,李科,陆丽玲. 石门国家森林公园唐鱼生境特征分析[J]. J4, 2012, 47(3): 1-4. |
[6] | 朱世伟,赛 英 . 基于主成分分析和粗径向基神经网络的财务预警模型研究[J]. J4, 2008, 43(11): 48-53 . |
[7] | 杨绍华,林 盘,潘 晨 . 利用小波变换提高基于KPCA方法的人脸识别性能[J]. J4, 2007, 42(9): 96-100 . |
|