您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

J4 ›› 2010, Vol. 45 ›› Issue (3): 34-40.

• 论文 • 上一篇    下一篇

一种基于EVS相似度的邮件社区聚类方法

王芳 郭华平 牛常勇 范明   

  1. 郑州大学信息工程学院, 河南 郑州 450052
  • 收稿日期:2009-12-30 出版日期:2010-03-16 发布日期:2010-04-02
  • 作者简介:王芳(1983-),女,硕士研究生,主要研究方向为数据挖掘、机器学习.Email:ie03wangfang@163.com
  • 基金资助:

    国家自然科学基金资助项目(60773048)

New email community clustering method based on EVS similarity  

 WANG Fang, GUO Hua-Ping, NIU Chang-Yong, FAN Ming   

  1. School of Information and Engineering, Zhengzhou University, Zhengzhou 450052, Henan, China
  • Received:2009-12-30 Online:2010-03-16 Published:2010-04-02

摘要:

聚类方法的核心是如何度量事物间的邻近性。介绍了邮件特征的向量表示形式、构建了邮件特征矩阵,并使用变形后的极值分布函数模型拟合了邮件间通信特征信息;在此基础上提出了一个新的邻近性度量方法(extreme value distribution similarity,EVS),用以指导邮件社区划分;使用微聚类-宏聚类邮件社区划分算法验证了该方法的有效性。实验表明,在测试数据集上,相比余弦、PCC等经典的邻近性度量方法,以EVS作为划分依据的邮件社区划分算法能够更加有效地发现高质量的邮件社区。

关键词: 社会网络;邮件社区划分;极值分布;EVS相似度

Abstract:

Proximity measurement between objects is a key problem of the clustering method. The email feature vector was introduced, and the email feature matrix was constructed. The information of email features was fitted by the model of the transformed extremal value distribution function. Based on this, EVS(extreme value distribution similarity) was proposed for email community clustering. The effectiveness of the new measurement was verified by the micromacro clustering algorithm. Experiments show that compared to cosine-based similarity and Pearson correlation coefficient, the algorithm using the new proposed similarity measurement can identify higher quality communities.

Key words: social network; email community partition; extreme value distribution; EVS similarity

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!