J4 ›› 2011, Vol. 46 ›› Issue (5): 71-76.

• Articles • Previous Articles     Next Articles

Research on spam detection techniques based on clustering

JIANG Sheng-yi1, PANG Guan-song2, ZHANG Jian-jun3   

  1. 1. School of Informatics, Guangdong University of Foreign Studies, Guangzhou 510420, Guangdong, China;
    2. School of Management, Guangdong University of Foreign Studies, Guangzhou 510006, Guangdong, China;
    3.College of Science, Naval University of Engineering, Wuhan 430033, Hubei, China
  • Received:2010-12-06 Published:2011-05-25


With the surge of email spam, how to detect it becomes an important and urgent problem. To cope with the defects of kNN spam detection, an improved kNN spam detection approach based on clustering is proposed. First, by using the least distance principle, the training email text samples are divided into several hyper spheres with the approximate radius, and the texts contained in hyper spheres are from one or more of these categories. Second, the clusters (hyper spheres) are tagged by using the majority voting mechanism,which means that each cluster is tagged with the category containing the most text in the cluster, and the detection model consists of tagged clusters. Finally, the email texts are detected with the kNN approach. Experimental results show that the proposed approach can substantially reduce the text similarity computation, and perform better than iMBL, Naïve Bayesian, and Stacking. Furthermore, the detection model constructed by the proposed approach can be incrementally updated, which has great feasibility in real-world applications.

Key words: spam detection; kNN text categorization; single pass clustering; incremental modeling

No related articles found!
Full text



No Suggested Reading articles found!