《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (8): 118-126.doi: 10.6040/j.issn.1671-9352.0.2023.250
• • 上一篇
Zhiqiang YANG(),Shan FENG*(),Yi YIN,Huijia WU
摘要:
基于邻域粗糙集的对象邻域相对比和对象重要度等粒化特征,提出了改进的基于邻域粗糙熵的多因素融合的离群点检测(neighborhood rough entropy-based outlier,NREOD)算法。在加利福尼亚大学尔湾分校(University of CaliforniaIrvine,UCI)数据库的标准数据集上的对比实验表明,NREOD算法在不同类型的数据集的离群检测的误判率更低,并且有更好的适应性和有效性。此算法为混合型属性数据集的离群检测研究与应用提供了一条新的有效途径。
中图分类号:
1 | 梅林, 张凤荔, 高强. 离群点检测技术综述[J]. 计算机应用研究, 2020, 37 (12): 3521- 3527. |
MEI Lin , ZHANG Fenli , GAO Qiang . Overview of outlier detection technology[J]. Application Research of Computers, 2020, 37 (12): 3521- 3527. | |
2 | ROUSSEEUW P J , LEROY A M . Robust regression and outlier detection[M]. New York: Wiley, 1987: 1- 18. |
3 | KNORR E M, NG R T. A unified notion of outliers: properties and computation[C]//Knowledge Discovery and Data Mining. Montreal: IEEE, 1997: 219-222. |
4 | KNORR E M , NG R T , TUCAKOV V . Distance-based outliers: algorithms and applications[J]. The International Journal on Very Large Data Bases, 2000, 8 (3): 237- 253. |
5 | BREUNIG MM, KRIEGEL H P, NGR T, et al. LOF: identifying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas: IEEE, 2000: 93-104. |
6 |
JAIN A K , MURTY M N , FLYNN P J . Data clustering: a review[J]. ACM Computing Surveys (CSUR), 1999, 31 (3): 264- 323.
doi: 10.1145/331499.331504 |
7 | PAWLAK Z . Rough sets: theoretical aspects of reasoning about data[M]. New York: Kluwer Academic Publishers, 1992. |
8 |
徐波, 冯山. 基于邻域关系矩阵的属性约简算法[J]. 小型微型计算机系统, 2019, 40 (8): 1595- 1600.
doi: 10.3969/j.issn.1000-1220.2019.08.003 |
XU Bo , FENG Shan . Attribute reduction algorithm based on neighborhood relationship matrix[J]. Journal of Chinese Computer Systems, 2019, 40 (8): 1595- 1600.
doi: 10.3969/j.issn.1000-1220.2019.08.003 |
|
9 | 杨晓玲, 张贤勇. 基于邻域粗糙隶属函数的离群点检测[J]. 计算机工程与设计, 2019, 40 (2): 533- 539. |
YANG Xiaoling , ZHANG Xianyong . Outlier detection based on neighborhood rough membership function[J]. Computer Engineering and Design, 2019, 40 (2): 533- 539. | |
10 |
SHANNON C E . A mathematical theory of communication[J]. The Bell System Technical Journal, 1948, 27 (3): 379- 423.
doi: 10.1002/j.1538-7305.1948.tb01338.x |
11 | 谭阳. 基于粗糙熵的渐进式离群点检测方法研究[D]. 成都: 四川师范大学, 2021. |
TAN Yang. Research on progressive outlier detection based on rough entropy[D]. Chengdu: Sichuan Normal University, 2021. | |
12 |
付沙, 肖叶枝, 周航军. 基于粗糙集理论的高校教师评价体系研究[J]. 山西档案, 2019, (1): 174- 178.
doi: 10.3969/j.issn.1005-9652.2019.01.039 |
FU Sha , XIAO Yezhi , ZHOU Hangjun . Research on the evaluation system of university teachers based on rough set theory[J]. Shanxi Archives, 2019, (1): 174- 178.
doi: 10.3969/j.issn.1005-9652.2019.01.039 |
|
13 | 李虹欣. 基于条件熵的邻域粗糙集属性约简算法及其应用[D]. 大连: 大连交通大学, 2021. |
LI Hongxin. Attribute reduction algorithm of neighborhood rough set based on conditional entropy and its application[D]. Dalian: Dalian Jiaotong University, 2021. | |
14 |
阳恋, 冯山. 一般二元关系中基于边界域的知识粗糙熵与粗集粗糙熵[J]. 四川师范大学学报(自然科学版), 2008, 31 (3): 273- 277.
doi: 10.3969/j.issn.1001-8395.2008.03.005 |
YANG Lian , FENG Shan . Rough entropies of knowledge and rough set based on boundary region of general binary relation[J]. Journal of Sichuan Normal University (Natural Science), 2008, 31 (3): 273- 277.
doi: 10.3969/j.issn.1001-8395.2008.03.005 |
|
15 | 杨洁, 王国胤, 李帅. 基于边界域的邻域知识距离度量模型[J]. 计算机科学, 2020, 47 (3): 61- 66. |
YANG Jie , WANG Guoyin , LI Shuai . Neighborhood knowledge distance measurement model based on boundary region[J]. Computer Science, 2020, 47 (3): 61- 66. | |
16 |
李毅, 胡建成. 一种面向混合属性数据的邻域粒离群点检测[J]. 小型微型计算机系统, 2020, 41 (4): 855- 860.
doi: 10.3969/j.issn.1000-1220.2020.04.032 |
LI Yi , HU Jiancheng . Outlier detection based on neighborhood granule for mixed attribute data[J]. Journal of Chinese Computer Systems, 2020, 41 (4): 855- 860.
doi: 10.3969/j.issn.1000-1220.2020.04.032 |
|
17 | 张玉婷, 冯山. 一种基于邻域近似精度的离群点检测方法[J]. 数据采集与处理, 2022, 37 (5): 1018- 1025. |
ZHANG Yuting , FENG Shan . An outlier detection method based on neighborhood approximation accuracy[J]. Journal of Data Acquisition and Processing, 2022, 37 (5): 1018- 1025. | |
18 |
段珣, 杨志勇, 江峰. 一种基于邻域粒度熵的离群点检测算法[J]. 计算机与现代化, 2022, 38 (10): 19- 23.
doi: 10.3969/j.issn.1006-2475.2022.10.004 |
DUAN Xun , YANG Zhiyong , JIANG Feng . An outlier detection algorithm based on neighborhood granularity entropy[J]. Computer and Modernization, 2022, 38 (10): 19- 23.
doi: 10.3969/j.issn.1006-2475.2022.10.004 |
|
19 |
YUAN Zhong , ZHANG Xianyong , FENG Shan . Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures[J]. Expert Systems with Applications, 2018, 112, 243- 257.
doi: 10.1016/j.eswa.2018.06.013 |
20 | 刘意如. 基于邻域粗糙隶属度和邻域类熵的序列离群点检测研究[D]. 成都: 四川师范大学, 2021. |
LIU Yiru. Sequence outlier detection based on neighborhood rough membership degree and neighborhood class entropy[D]. Chengdu: Sichuan Normal University, 2021. | |
21 |
CHEN Yumin , XUE Yu , MA Ying , et al. Measures of uncertainty for neighborhood rough sets[J]. Knowledge-based Systems, 2017, 120, 226- 235.
doi: 10.1016/j.knosys.2017.01.008 |
22 | RAMASWAMY S, RASTOGI R, SHIM K. Efficient algorithms for mining outliers from large data sets[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York: IEEE, 2000: 427-438. |
23 | KNORR E M, NG R T. Algorithms for mining distance based outliers in large datasets[C]//Proceedings of the 24th International Conference on Very Large Data Bases. New York: IEEE, 1998: 392-403. |
24 | HE Zengyou , XU Xiaofeng , DENG Shengchun . Discovering cluster-based local outliers[J]. Pattern Recognition Letters, 2003, 24 (9/10): 1641- 1650. |
25 |
JIANG Feng , SUI Yuefei , CAO Cungen . Some issues about outlier detection in rough set theory[J]. Expert Systems with Applications, 2009, 36 (3): 4680- 4687.
doi: 10.1016/j.eswa.2008.06.019 |
26 |
JIANG Feng , SUI Yuefei , CAO Cungen . An information entropy-based approach to outlier detection in rough sets[J]. Expert Systems with Applications, 2010, 37 (9): 6338- 6344.
doi: 10.1016/j.eswa.2010.02.087 |
27 | AGGARWAL C C, YU P S. Outlier detection for high dimensional data[C]//Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. Santa Barbara California: IEEE, 2001: 37-46. |
28 | HAWKINS S, HE H, WILLIAMS G, et al. Outlier detection using replicator neural networks[C]//International Conference on Data Warehousing and Knowledge Discovery. Berlin: Springer, 2002: 170-180. |
[1] | 温欣,李德玉. 基于属性加权的ML-KNN方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 107-117. |
[2] | 胡成祥,张莉,黄晓玲,王汇彬. 面向属性变化的动态邻域粗糙集知识更新方法[J]. 《山东大学学报(理学版)》, 2023, 58(7): 37-51. |
[3] | 时俊鹏,张燕兰. 面向对象删除的局部邻域粗糙集动态更新算法[J]. 《山东大学学报(理学版)》, 2023, 58(5): 17-25. |
[4] | 刘长顺,刘炎,宋晶晶,徐泰华. 基于论域离散度的属性约简算法[J]. 《山东大学学报(理学版)》, 2023, 58(5): 26-35. |
[5] | 孙林,梁娜,徐久成. 基于自适应邻域互信息与谱聚类的特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(12): 13-24. |
[6] | 张超,梁英,方浩汕. 支持隐私保护的社交网络信息推荐方法[J]. 《山东大学学报(理学版)》, 2020, 55(3): 9-18. |
[7] | 谢小杰,梁英,董祥祥. 社交网络用户敏感属性迭代识别方法[J]. 《山东大学学报(理学版)》, 2019, 54(3): 10-17, 27. |
[8] | 康海燕,马跃雷. 差分隐私保护在数据挖掘中应用综述[J]. 山东大学学报(理学版), 2017, 52(3): 16-23. |
[9] | 柳欣,徐秋亮,张波. 满足可控关联性的合作群签名方案[J]. 山东大学学报(理学版), 2016, 51(9): 18-35. |
[10] | 张凌, 任雪芳. 基数余-亏定理与数据外-内挖掘-分离[J]. 山东大学学报(理学版), 2015, 50(08): 90-94. |
[11] | 吴熙曦, 李炳龙, 张天琪. 基于KNN的Android智能手机微信取证方法[J]. 山东大学学报(理学版), 2014, 49(09): 150-153. |
[12] | 张文东1,尹金焕1,贾晓飞2,黄超1,苑衍梅1. 基于向量的频繁项集挖掘算法研究[J]. J4, 2011, 46(3): 31-34. |
[13] | 朱国红 石冰 邢晓娜. 基于特征点选择的聚类算法研究[J]. J4, 2009, 44(9): 40-42. |
[14] | 娄兰芳,潘庆先 . 基于集合运算的频繁集挖掘优化算法[J]. J4, 2008, 43(11): 54-57 . |
[15] | 闫宗奎,石 冰 . 基于网格模型的孤立点检测算法[J]. J4, 2008, 43(11): 58-60 . |
|