一种基于启发式规则的半监督垃圾评论分类方法

doi:10.6040/j.issn.1671-9352.1.2016.PC6

Abstract

Abstract: Nowadays the Internet has affected everyones lives. E-commercial websites such as online-shopping, group purchases, and online consumption have already become most popular consumption patterns. Almost every e-commercial websites enable and encourage their customers to write a review on their products and services. These customers generative reviews are valuable to potential consumers and merchants, which leads a situation that spam reviews are added into the e-commercial websites manually on purpose of promoting products or damaging reputation of other merchants. Based on this application background, the spam reviews detection research aims to get rid of spam reviews and to make full use of normal customer reviews. This paper focus on COAE2015-TASK4, which sets up a public task of spam review detection. We proposed a semi-supervised spam review classification method based on heuristic rules using the corpora resources provided by the COAE2015-TASK4. Experiments showed our method can effectively detect spam reviews and keep a high classification accuracy of normal customer reviews.

Key words: spam review classification, heuristic rules, semi-supervised learning

CLC Number:

TP391

ZHANG Peng, WANG Su-ge, LI De-yu, WANG Jie. A semi-supervised spam review classification method based on heuristic rules[J].JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(7): 44-51.

References

[1] HEYDARI A, ALI TAVAKOLI M, SALIM N, et al. Detection of review spam: asurvey[J]. Expert Systems with Applications, 2015, 42(7):3634-3642.
[2] JINDAL N, LIU B. Analyzing and detecting review spam[C] // Proceeding of 7th IEEE International Conference on Data Mining(ICDM 2007). New York: IEEE, 2007: 547-552.
[3] CASTILLO C, DONATO D, BECCHETTI L, et al. A reference collection for Web spam[J]. ACM SigirForum, 2006, 40(2):11-24.
[4] FETTERLY D, MANASSE M, NAJORK M. Spam, damn spam, and statistics:using statistical analysis to locate spam Web pages[C] // Proceedings of the 7th International Workshop on the Web and Databases: Collocated with ACM Sigmod/pods 2004. New York: ACM, 2004: 1-6.
[5] JINDAL N, LIU B. Opinion spam and analysis[C] // Proceedings of the 2008 International Conference on Web Search and Data Mining. New York: ACM, 2008: 219-230.
[6] LI H, CHEN Z, MUKHERJEE A, et al. Analyzing and detecting opinion spam on alarge-scale dataset via temporal and spatial patterns[C] // Proceedings of The 9th International AAAI Conference on Web and Social Media(ICWSM-15). Menlo Park: ICWSM, 2015: 26-29.
[7] XIE S, WANG G, LIN S, et al. Review spam detection via temporal pattern discovery[C] // Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012: 823-831.
[8] FEI G, MUKHERJEE A, LIU B, et al. Exploiting burstiness in rviews for review spammer detection[J]. ICWSM, 2013, 13:175-184.
[9] SHARMA K, LIN K I. Review spam detector with rating consistency check[C] // Proceedings of the 51st ACM Southeast Conference. New York: ACM, 2013: 34-39.
[10] MUKHERJEE A, KUMAR A, LIU B, et al. Spotting opinion spammers using behavioral footprints[C] // Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2013: 632-640.
[11] KARAMI A, ZHOU B. Online review spam detection by new linguistic features[C] // iConference 2015 Proceedings.Urbana: IDEALS, 2015: 1-5.
[12] 刁宇峰, 杨亮, 林鸿飞. 基于LDA模型的博客垃圾评论发现[J]. 中文信息学报, 2011, 25(1):41-48. DIAO Yufeng, YANG Liang, LIN Hongfei. LDA-based opinion spam discovering[J]. Journal of Chinese Information Processing, 2011, 25(1):41-48.
[13] MUKHERJEE A, VENKATARAMAN V. Opinion spam detection: an unsupervised approach using generative models[J]. Techincal Report, UH, 2014(07):1-11.
[14] XU Q, ZHAO H. Using deep linguistic features for finding deceptive opinion spam[C] // In Proceedings of COLING 2012. Stroudsburg: ACL, 2013: 1341-1350.
[15] AL NAJADA H, ZHU X. iSRD: spam review detection with imbalanced data distributions[C] // Proceeding of Information Reuse and Integration(IRI), 2014 IEEE 15th International Conference on Information Reuse and Integration. New York: IEEE, 2014: 553-560.
[16] LI J, OTT M, CARDIE C, et al. Towards a general rule for identifying deceptive opinion spam[C] // InProceedings of the 52^ndAnnual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2014: 1566-1576.
[17] LIN Y, ZHU T, WANG X, et al. Towards online review spam detection[C] // Proceedings of the 23rd International Conference on World Wide Web. New York: ACM, 2014: 341-342.
[18] 何珑. 基于随机森林的产品垃圾评论识别[J]. 中文信息学报, 2015, 29(3):150-154. HE Long. Identification of product review spam by random forest[J]. Journal of Chinese Information Processing, 2015, 29(3):150-154.
[19] LI H, LIU B, MUKHERJEE A, et al. Spotting fake reviews using positive-unlabeled learning[J]. Computación y Sistemas, 2014, 18(3):467-475.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

A semi-supervised spam review classification method based on heuristic rules

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 2

Metrics

Comments

Recommended 0

[1]	SU Feng-long, XIE Qing-hua, HUANG Qing-quan, QIU Ji-yuan, YUE Zhen-jun. Semi-supervised method for attribute extraction based on transductive learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(3): 111-115.
[2]	DU Hong-le, ZHANG Yan, ZHANG Lin. Intrusion detection on imbalanced dataset [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 50-57.