JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2019, Vol. 54 ›› Issue (7): 57-67.doi: 10.6040/j.issn.1671-9352.1.2018.077

Previous Articles     Next Articles

Review spam detection based on the two-level stacking classification model

Xiang-wen LIAO1,2,3,*(),Yang XU1,2,3,Jing-jing WEI4,Ding-da YANG1,2,3,Guo-long CHEN1,2,3   

  1. 1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, Fujian, China
    2. Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou 350116, Fujian, China
    3. Digital Fujian Institute of Financial Big Data, Fuzhou 350116, Fujian, China
    4. College of Electronics and Information Science, Fujian Jiangxia University, Fuzhou 350108, Fujian, China
  • Received:2018-10-17 Online:2019-07-20 Published:2019-06-27
  • Contact: Xiang-wen LIAO
  • Supported by:


For the issue of review spam detection, on the one hand, the time and space complexity of existing methods is high when extracting user behavior relationships and training neural network. On the other hand, the non-standard writing format of E-commercial reviews leads to the indistinct contextual features and most experiment did not consider the effect of the imbalance of data. Therefore, we propose a method for review spam detection based on a two-level stacking classification model. In the method, the relationship between users and products is represented by a triplet. In order to characterize user's behavior and reduce complexity, low-dimensional feature representations are obtained by the principal component analysis. Then, the extracted paragraphs vector representation, information entropy and text similarity is represented as discrete feature to avoid indistinct of contextual features. Finally, the three connections are taken as the overall features combining text and behavioral features. These features are regarded as the input of the two-level stacking classification model in order to improve performance in unbalanced dataset. We conducted experiments in the Yelp 2013 dataset. Experimental results show the F1 value of our proposed method is 1.7%—5.2% better than the state-of-the-art method. What's more, the classification performance is significantly improved in the unbalanced dataset.

Key words: review detection, feature fusion, ensemble learning, principal component analysis

CLC Number: 

  • TP391


Review spam detection based on two-layer stacking model"

Table 1

Discrete feature"



Construction of classification model based on ensemble learning"

Table 2

Review spam detection algorithm based on two-layer stacking classification model"

输入:评论数据集合X{x1, x2, …, xn}、预设参数
输出:评论检测结果集合Y{y1, y2, …, yn}
5:拼接特征F=concatenate{T, D, L}得到总特征表示;
9:输出结果Y{y1, y2, …, yn}

Table 3

Dataset statistics"

水军评论数8028 368
非水军评论数4 87650 149
总评论数5 67858 517
总评论者数5 12435 593

Table 4

Experimental parameter setting"


Table 5

Models evaluation results"


Table 6

Comparison of different feature extraction methods"


Table 7

Two types of reviews typical discrete feature"


Table 8

Comparison of discrete feature effects"



Comparison of F1 values of different classifier models"


Impact of different distribution datasets on F1 values"

1 OTT M, CHOI Y, CARDIE C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]// Proceedings of the Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACM, 2011: 309-319.
2 KIM S, CHANG H, LEE S, et al. Deep semantic frame-based deceptive opinion spam analysis[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. New York: ACM, 2015: 1131-1140.
3 KO M C, CHEN H H. Analysis of cyber army's behaviours on web forum for elect campaign[C]// Proceedings of the Asia Information Retrieval Symposium. Switzerland: Springer, Cham, 2015: 394-399.
4 LI Huayi, FEI Geli, SHAO Weixiang, et al. Bimodal distribution and co-bursting in review spam detection[C]// Proceedings of the International Conference on World Wide Web. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee, 2017: 1063-1072.
5 REN Yafeng, ZHANG Yue. Deceptive opinion spam detection using neural network[C]// Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. Osaka: The COLING 2016 Organizing Committee, 2016: 140-150.
6 WANG Xuepeng, LIU Kang, ZHAO Jun. Handling cold-start problem in review spam detection by jointly embedding texts and behaviors[C]// Proceedings of the Meeting of the Association for Computational Linguistics. Vancouver: ACM, 2017: 366-376.
7 KIM Y. Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: EMNLP, 2014: 1746-1751.
8 SANTOSH K C, MAITY S K, MUKHERJEE A. ENWalk: learning network features for spam detection in twitter[C]// Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Switzerland; Springer, Cham, 2017: 90-101.
9 RAYANA S, AKOGLU L. Collective opinion spam detection: bridging review networks and metadata[C]// Proceedings of the 21th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2015: 985-994.
10 WANG Xuepeng, LIU Kang, HE Shizhu, et al. Learning to represent review with tensor decomposition for spam detection[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Austin: EMNLP, 2016: 866-875.
11 WANG Yalin , SUN Kenan , YUAN Xiaofeng , et al. A novel sliding window PCA-IPF based steady-state detection framework and its industrial application[J]. IEEE Access, 2018, 6: 20995- 21004.
doi: 10.1109/ACCESS.2018.2825451
12 LE Q, MIKOLOV T.Distributed representations of sentences and documents[C]// Proceedings of the International Conference on Machine Learning. Beijing: JMLR, 2014: 1188-1196.
13 CHEN Yijun, MAN Leungwong.Optimizing stacking ensemble by an ant colony optimization approach[C]// Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation. New York: ACM, 2011: 7-8.
14 SANTOSH K C, ARJUN Mukherjee. On the temporal dynamics of opinion spamming: case studies on yelp[C]// Proceedings of the 25th International Conference on World Wide Web. Republic and Canton of Geneva, Switzerland: WWW, 2016: 369-379.
15 MUKHERJEE A, VENKATARAMAN V, LIU B, et al. What yelp fake review filter might be doing[C]// Proceedings of the International AAAI Conference on Web and Social Media. Menlo Park: AAAI, 2013: 409-418.
16 HAI Zeng, ZHAO Peilin, CHENG Peng, et al. Deceptive review spam detection via exploiting task relatedness and unlabeled data[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Austin: EMNLP, 2016: 1817-1826.
17 FAKHRAEI S, SHASHANKA M. Collective spammer detection in evolving multi-relational social networks[C]// Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2015: 1769-1778.
[1] LI Run-chuan, ZAN Hong-ying, SHEN Sheng-ya, BI Yin-long, ZHANG Zhong-jun. Spam messages identification based on multi-feature fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(7): 73-79.
[2] SHAO Wei1, ZHU Li-ping2, LIU Fu-Guo2, WANG Qiu-Ping2. Sparse principal component analysis for symmetric matrix and  application in sufficient dimension reduction [J]. J4, 2012, 47(4): 116-120.
[3] ZHOU Juan1, WANG Ren-qing2, GUO Wei-hua2*, WANG Qiang2, WANG Wei2, . Soil geochemical elements in the Yutai high quality rice base [J]. J4, 2012, 47(3): 5-9.
[4] WANG De-liang, LI Ke, LU Li-ling. Analysis of habitat characteristics of Tanichthys albonubes in Shimen National Forest Park [J]. J4, 2012, 47(3): 1-4.
[5] ZHU Shi-wei,SAI Ying . The prediction model of financial distress of Chinese listed corporations based on a hybrid RPR model [J]. J4, 2008, 43(11): 48-53 .
[6] YANG Shao-hua,LIN Pan,PAN Chen . Performance improvement of face recognition based on kernel principal component analysis using wavelet transform [J]. J4, 2007, 42(9): 96-100 .
Full text



[1] ZHANG Jing-you, ZHANG Pei-ai, ZHONG Hai-ping. The application of evolutionary graph theory in the design of knowledge-based enterprises’ organization strucure[J]. J4, 2013, 48(1): 107 -110 .
[2] GUO Lan-lan1,2, GENG Jie1, SHI Shuo1,3, YUAN Fei1, LEI Li1, DU Guang-sheng1*. Computing research of the water hammer pressure in the process of #br# the variable speed closure of valve based on UDF method[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 27 -30 .
[3] SHI Kai-quan. P-information law intelligent fusion and soft information #br# image intelligent generation[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(04): 1 -17 .
[4] TANG Xiao-hong1, HU Wen-xiao2*, WEI Yan-feng2, JIANG Xi-long2, ZHANG Jing-ying2, SHAO Xue-dong3. Screening and biological characteristics studies of wide wine-making yeasts[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 12 -17 .
[5] ZENG Weng-fu1, HUANG Tian-qiang1,2, LI Kai1, YU YANG-qiang1, GUO Gong-de1,2. A local linear emedding agorithm based on harmonicmean geodesic kernel[J]. J4, 2010, 45(7): 55 -59 .
[6] GUO Wen-juan, YANG Gong-ping*, DONG Jin-li. A review of fingerprint image segmentation methods[J]. J4, 2010, 45(7): 94 -101 .
[7] MENG Xiang-bo1, ZHANG Li-dong1, DU Zi-ping2. Investment and reinsurance strategy for insurers under #br# mean-variance criterion with jumps#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 36 -40 .
[8] PENG Zhen-hua, XU Yi-hong*, TU Xiang-qiu. Optimality conditions for weakly efficient elements of nearly preinvex set-valued optimizaton#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 41 -44 .
[9] HU Ming-Di, SHE Yan-Hong, WANG Min. Topological properties of  three-valued   logic  metric space[J]. J4, 2010, 45(6): 86 -90 .
[10] HE Hai-lun, CHEN Xiu-lan* . Circular dichroism detection of the effects of denaturants and buffers on the conformation of cold-adapted protease MCP-01 and  mesophilic protease BP01[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 23 -29 .