JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (8): 118-126.doi: 10.6040/j.issn.1671-9352.0.2023.250

Previous Articles    

An efficient outlier detection method based on multi-factor fusion

Zhiqiang YANG(),Shan FENG*(),Yi YIN,Huijia WU   

  1. College of Mathematics and Science, Sichuan Normal University, Chengdu 610068, Sichuan, China
  • Received:2023-06-05 Online:2024-08-20 Published:2024-07-31
  • Contact: Shan FENG E-mail:1601298260@qq.com;fengshanrq@sohu.com

Abstract:

Based on granularity characteristics of relative ratio of object neighborhood and importance of objects in neighborhood rough sets, an improved outlier detection method (neighborhood rough entropy-based outlier, NREOD) based on neighborhood rough entropy and multi-factor fusion is proposed. Comparison experiments on standard data sets in University of CaliforniaIrvine (UCI) databases show that the NREOD algorithm has a lower false positive rate for outlier detection in different types of data sets, and has better adaptability and effectiveness. This algorithm provides a new effective way for the research and application of outlier detection in mixed attribute data sets.

Key words: data mining, outlier detection, neighborhood rough set, neighborhood rough entropy, multi-factor fusion

CLC Number: 

  • TP181

Fig.1

Measure structure of neighborhood rough entropy outlier factor fused with neighborhood relative"

Fig.2

NREOD algorithm flowchart"

Table 1

Comparing results of experiment for German data set when UG: |UG|=714, |RUG|=14"

K/% k NREOD算法 KNN算法 DIS算法 FindCBLOF算法 SEQ算法 IE算法 NGOD算法 OD_NGE算法
T1/% t1 T2/% t2 T3/% t3 T4/% t4 T5/% t5 T6/% t6 T7/% t7 T8/% t8
0.98 7 14.29 2 28.57 4 14.29 2 21.43 3 14.29 2 21.43 3 21.43 3 21.43 3
1.96 14 35.71 5 28.57 4 21.43 3 35.71 5 35.71 5 28.57 4 42.86 6 35.71 5
5.04 36 85.70 12 57.14 8 35.71 5 50.00 7 64.29 9 57.14 8 92.86 13 71.43 10
6.02 43 100.00 14 57.14 8 42.85 6 71.43 10 64.29 9 71.43 10 92.86 13 78.57 11
7.56 54 100.00 14 64.29 9 64.29 9 100.00 14 71.43 10 92.86 13 100.00 14 85.71 12
11.20 80 100.00 14 100.00 14 78.57 11 100.00 14 71.43 10 100.00 14 100.00 14 92.86 13
16.95 121 100.00 14 100.00 14 100.00 14 100.00 14 85.71 12 100.00 14 100.00 14 100.00 14
24.37 174 100.00 14 100.00 14 100.00 14 100.00 14 100.00 14 100.00 14 100.00 14 100.00 14

Table 2

Comparing results of experiment for WBC data set when UW : |UW|=483, |RUW|=39"

K/% k NREOD算法 KNN算法 DIS算法 FindCBLOF算法 SEQ算法 IE算法 OD_NGE算法 NGOD算法
T1/% t1 T2/% t2 T3/% t3 T4/% t4 T5/% t5 T6/% t6 T7/% t7 T8/% t8
0.83 4 10.26 4 10.26 4 10.26 4 10.26 4 7.69 3 10.26 4 10.26 4 10.26 4
1.66 8 20.51 8 20.51 8 12.82 5 17.95 7 17.95 7 17.95 7 20.51 8 20.51 8
3.31 16 41.03 16 41.03 16 28.21 11 35.90 14 35.90 14 38.46 15 38.46 15 41.03 16
4.97 24 61.54 24 51.28 20 46.15 18 53.85 21 53.85 21 53.85 21 58.97 23 58.97 23
6.63 32 74.36 29 69.23 27 61.54 24 69.23 27 71.79 28 71.79 28 76.92 30 74.36 29
8.28 40 87.17 34 82.05 32 74.36 29 82.05 32 82.05 32 84.61 33 87.17 34 87.17 34
10.14 49 100.00 39 94.87 37 92.31 36 89.74 35 89.74 35 92.31 36 97.44 38 97.44 38
11.59 56 100.00 39 100.00 39 100.00 39 97.44 38 100.00 39 100.00 39 100.00 39 100.00 39
13.25 64 100.00 39 100.00 39 100.00 39 100.00 39 100.00 39 100.00 39 100.00 39 100.00 39

Table 3

Comparing results of the experiment for Lymphography data set when UL : |UL|=148, |RUL|=6"

K/% k NREOD算法 KNN算法 DIS算法 FindCBLOF算法 SEQ算法 IE算法 NGOD算法 OD_NGE算法
T1/% t1 T2/% t2 T3/% t3 T4/% t4 T5/% t5 T6/% t6 T7/% t7 T8/% t8
4.05 6 83.33 5 66.67 4 66.67 4 66.67 4 14.29 2 83.33 5 83.33 5 83.33 5
4.73 7 100.00 6 66.67 4 83.33 5 66.67 4 83.33 5 83.33 5 100.00 6 83.33 5
5.41 8 100.00 6 83.33 5 83.33 5 66.67 4 83.33 5 100.00 6 100.00 6 83.33 5
6.08 9 100.00 6 100.00 6 83.33 5 66.67 4 83.33 5 100.00 6 100.00 6 100.00 6
8.11 12 100.00 6 100.00 6 100.00 6 66.67 4 100.00 6 100.00 6 100.00 6 100.00 6
13.51 20 100.00 6 100.00 6 100.00 6 66.67 4 100.00 6 100.00 6 100.00 6 100.00 6
20.27 30 100.00 6 100.00 6 100.00 6 100.00 6 100.00 6 100.00 6 100.00 6 100.00 6
1 梅林, 张凤荔, 高强. 离群点检测技术综述[J]. 计算机应用研究, 2020, 37 (12): 3521- 3527.
MEI Lin , ZHANG Fenli , GAO Qiang . Overview of outlier detection technology[J]. Application Research of Computers, 2020, 37 (12): 3521- 3527.
2 ROUSSEEUW P J , LEROY A M . Robust regression and outlier detection[M]. New York: Wiley, 1987: 1- 18.
3 KNORR E M, NG R T. A unified notion of outliers: properties and computation[C]//Knowledge Discovery and Data Mining. Montreal: IEEE, 1997: 219-222.
4 KNORR E M , NG R T , TUCAKOV V . Distance-based outliers: algorithms and applications[J]. The International Journal on Very Large Data Bases, 2000, 8 (3): 237- 253.
5 BREUNIG MM, KRIEGEL H P, NGR T, et al. LOF: identifying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas: IEEE, 2000: 93-104.
6 JAIN A K , MURTY M N , FLYNN P J . Data clustering: a review[J]. ACM Computing Surveys (CSUR), 1999, 31 (3): 264- 323.
doi: 10.1145/331499.331504
7 PAWLAK Z . Rough sets: theoretical aspects of reasoning about data[M]. New York: Kluwer Academic Publishers, 1992.
8 徐波, 冯山. 基于邻域关系矩阵的属性约简算法[J]. 小型微型计算机系统, 2019, 40 (8): 1595- 1600.
doi: 10.3969/j.issn.1000-1220.2019.08.003
XU Bo , FENG Shan . Attribute reduction algorithm based on neighborhood relationship matrix[J]. Journal of Chinese Computer Systems, 2019, 40 (8): 1595- 1600.
doi: 10.3969/j.issn.1000-1220.2019.08.003
9 杨晓玲, 张贤勇. 基于邻域粗糙隶属函数的离群点检测[J]. 计算机工程与设计, 2019, 40 (2): 533- 539.
YANG Xiaoling , ZHANG Xianyong . Outlier detection based on neighborhood rough membership function[J]. Computer Engineering and Design, 2019, 40 (2): 533- 539.
10 SHANNON C E . A mathematical theory of communication[J]. The Bell System Technical Journal, 1948, 27 (3): 379- 423.
doi: 10.1002/j.1538-7305.1948.tb01338.x
11 谭阳. 基于粗糙熵的渐进式离群点检测方法研究[D]. 成都: 四川师范大学, 2021.
TAN Yang. Research on progressive outlier detection based on rough entropy[D]. Chengdu: Sichuan Normal University, 2021.
12 付沙, 肖叶枝, 周航军. 基于粗糙集理论的高校教师评价体系研究[J]. 山西档案, 2019, (1): 174- 178.
doi: 10.3969/j.issn.1005-9652.2019.01.039
FU Sha , XIAO Yezhi , ZHOU Hangjun . Research on the evaluation system of university teachers based on rough set theory[J]. Shanxi Archives, 2019, (1): 174- 178.
doi: 10.3969/j.issn.1005-9652.2019.01.039
13 李虹欣. 基于条件熵的邻域粗糙集属性约简算法及其应用[D]. 大连: 大连交通大学, 2021.
LI Hongxin. Attribute reduction algorithm of neighborhood rough set based on conditional entropy and its application[D]. Dalian: Dalian Jiaotong University, 2021.
14 阳恋, 冯山. 一般二元关系中基于边界域的知识粗糙熵与粗集粗糙熵[J]. 四川师范大学学报(自然科学版), 2008, 31 (3): 273- 277.
doi: 10.3969/j.issn.1001-8395.2008.03.005
YANG Lian , FENG Shan . Rough entropies of knowledge and rough set based on boundary region of general binary relation[J]. Journal of Sichuan Normal University (Natural Science), 2008, 31 (3): 273- 277.
doi: 10.3969/j.issn.1001-8395.2008.03.005
15 杨洁, 王国胤, 李帅. 基于边界域的邻域知识距离度量模型[J]. 计算机科学, 2020, 47 (3): 61- 66.
YANG Jie , WANG Guoyin , LI Shuai . Neighborhood knowledge distance measurement model based on boundary region[J]. Computer Science, 2020, 47 (3): 61- 66.
16 李毅, 胡建成. 一种面向混合属性数据的邻域粒离群点检测[J]. 小型微型计算机系统, 2020, 41 (4): 855- 860.
doi: 10.3969/j.issn.1000-1220.2020.04.032
LI Yi , HU Jiancheng . Outlier detection based on neighborhood granule for mixed attribute data[J]. Journal of Chinese Computer Systems, 2020, 41 (4): 855- 860.
doi: 10.3969/j.issn.1000-1220.2020.04.032
17 张玉婷, 冯山. 一种基于邻域近似精度的离群点检测方法[J]. 数据采集与处理, 2022, 37 (5): 1018- 1025.
ZHANG Yuting , FENG Shan . An outlier detection method based on neighborhood approximation accuracy[J]. Journal of Data Acquisition and Processing, 2022, 37 (5): 1018- 1025.
18 段珣, 杨志勇, 江峰. 一种基于邻域粒度熵的离群点检测算法[J]. 计算机与现代化, 2022, 38 (10): 19- 23.
doi: 10.3969/j.issn.1006-2475.2022.10.004
DUAN Xun , YANG Zhiyong , JIANG Feng . An outlier detection algorithm based on neighborhood granularity entropy[J]. Computer and Modernization, 2022, 38 (10): 19- 23.
doi: 10.3969/j.issn.1006-2475.2022.10.004
19 YUAN Zhong , ZHANG Xianyong , FENG Shan . Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures[J]. Expert Systems with Applications, 2018, 112, 243- 257.
doi: 10.1016/j.eswa.2018.06.013
20 刘意如. 基于邻域粗糙隶属度和邻域类熵的序列离群点检测研究[D]. 成都: 四川师范大学, 2021.
LIU Yiru. Sequence outlier detection based on neighborhood rough membership degree and neighborhood class entropy[D]. Chengdu: Sichuan Normal University, 2021.
21 CHEN Yumin , XUE Yu , MA Ying , et al. Measures of uncertainty for neighborhood rough sets[J]. Knowledge-based Systems, 2017, 120, 226- 235.
doi: 10.1016/j.knosys.2017.01.008
22 RAMASWAMY S, RASTOGI R, SHIM K. Efficient algorithms for mining outliers from large data sets[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York: IEEE, 2000: 427-438.
23 KNORR E M, NG R T. Algorithms for mining distance based outliers in large datasets[C]//Proceedings of the 24th International Conference on Very Large Data Bases. New York: IEEE, 1998: 392-403.
24 HE Zengyou , XU Xiaofeng , DENG Shengchun . Discovering cluster-based local outliers[J]. Pattern Recognition Letters, 2003, 24 (9/10): 1641- 1650.
25 JIANG Feng , SUI Yuefei , CAO Cungen . Some issues about outlier detection in rough set theory[J]. Expert Systems with Applications, 2009, 36 (3): 4680- 4687.
doi: 10.1016/j.eswa.2008.06.019
26 JIANG Feng , SUI Yuefei , CAO Cungen . An information entropy-based approach to outlier detection in rough sets[J]. Expert Systems with Applications, 2010, 37 (9): 6338- 6344.
doi: 10.1016/j.eswa.2010.02.087
27 AGGARWAL C C, YU P S. Outlier detection for high dimensional data[C]//Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. Santa Barbara California: IEEE, 2001: 37-46.
28 HAWKINS S, HE H, WILLIAMS G, et al. Outlier detection using replicator neural networks[C]//International Conference on Data Warehousing and Knowledge Discovery. Berlin: Springer, 2002: 170-180.
[1] GAO Hefei, LI Yan, WANG Shuo. Feature selection for partial label learning based on neighborhood rough sets [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 100-113.
[2] Xin WEN,Deyu LI. The ML-KNN method based on attribute weighting [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 107-117.
[3] Chengxiang HU,Li ZHANG,Xiaoling HUANG,Huibin WANG. Dynamic neighborhood rough sets approaches for updating knowledge while attributes generalization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(7): 37-51.
[4] SHI Junpeng, ZHANG Yanlan. Dynamic updating algorithm of local neighborhood rough sets with the deletion of objects [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(5): 17-25.
[5] LIU Changshun, LIU Yan, SONG Jingjing, XU Taihua. Attribute reduction algorithm based on discreteness of the universe [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(5): 26-35.
[6] SUN Lin, LIANG Na, XU Jiu-cheng. Feature selection using adaptive neighborhood mutual information and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 13-24.
[7] Chao ZHANG,Ying LIANG,Hao-shan FANG. Social network information recommendation method of supporting privacy protection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(3): 9-18.
[8] Xiao-jie XIE,Ying LIANG,Xiang-xiang DONG. Sensitive attribute iterative inference method for social network users [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 10-17, 27.
[9] KANG Hai-yan, MA Yue-lei. Survey on application of data mining via differential privacy [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 16-23.
[10] LIU Xin, XU Qiu-liang, ZHANG Bo. Cooperative group signature scheme with controllable linkability [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 18-35.
[11] GUO Hua-long, REN Xue-fang, ZHANG Ling. Relationships between dynamic data mining and P-augmented matrix [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(8): 105-110.
[12] REN Xue-fang, ZHANG Ling. Perturbation theorems of inverse P-sets and perturbation-based data mining [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(12): 54-60.
[13] ZHANG Ling, REN Xue-fang. Surplus-deficient theorem of cardinal number and data internal-outer mining-separation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(08): 90-94.
[14] ZHANG Wen-dong1, YIN Jin-huan1, JIA Xiao-fei2, HUANG Chao1, YUAN Yan-mei1. Research of a frequent itemsets mining algorithm based on vector [J]. J4, 2011, 46(3): 31-34.
[15] SHU Guo-Gong, DAN Bing, GENG Xiao-Na. A clustering algorithm based on feature point selection [J]. J4, 2009, 44(9): 40-42.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] JIN Li-ming,YANG Yan*,LIU Wan-shun,HAN Bao-qin,TIAN Wen-jie,FAN Sheng-di . Protective effects of chitosan oligosaccharide and its derivatives on carbon tetrachloride-induced liver damage in mice[J]. J4, 2007, 42(7): 1 -04 .
[2] ZHANG Dong-qing, YIN Xiao-bin, GAO Han-peng. Quasi-linearly Armendariz modules[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(12): 1 -6 .
[3] QU Xiao-ying ,ZHAO Jing . Solution of the Klein-Gordon equation for the time-dependent potential[J]. J4, 2007, 42(7): 22 -26 .
[4] WANG Guang-chen . LQ nonzero sum stochastic differential game under partial observable information[J]. J4, 2007, 42(6): 12 -15 .
[5] ZHANG Shen-gui. Multiplicity of solutions for local superlinear p-kirchhoff-type equation#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 61 -68 .
[6] WU Chun-xue . WNUS property of Musielak-Orlicz sequence spaces[J]. J4, 2007, 42(3): 18 -22 .
[7] YANG Jun. Characterization and structural control of metalbased nanomaterials[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 1 -22 .
[8] DONG Wei-wei. A new method of DEA efficiency ranking for decision making units with independent subsystems[J]. J4, 2013, 48(1): 89 -92 .
[9] PEI Sheng-yu,ZHOU Yong-quan. A mult-objective particle swarm optimization algorithm based on  the  chaotic mutation[J]. J4, 2010, 45(7): 18 -23 .
[10] LUO Si-te, LU Li-qian, CUI Ruo-fei, ZHOU Wei-wei, LI Zeng-yong*. Monte-Carlo simulation of photons transmission at alcohol wavelength in  skin tissue and design of fiber optic probe[J]. J4, 2013, 48(1): 46 -50 .