JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2016, Vol. 51 ›› Issue (11): 7-12.doi: 10.6040/j.issn.1671-9352.0.2016.238

Previous Articles     Next Articles

Study on feature selection method based on information loss

LI Zhao1,2.3, SUN Zhan-2,3, LI Xiao,2,3, LI Cheng2.3   

  1. 1. School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China;2. Shandong Computer Science Center(National Supercomputer Center in Jinan), Jinan 250014, Shandong, China;3. Shandong Engineering Technology Research Center of Egovernment Big Data, Jinan 250014, Shandong, China;4. Shandong Provincial Key Laboratory of Computer Networks, Jinan 250014, Shandong, China
  • Received:2016-05-26 Online:2016-11-20 Published:2016-11-22

Abstract: The purpose of this paper is to realize fast feature selection through studying the measure metric between the features and the calculation method of correlation between class variable and selected feature subset. A novel information loss metric based on extended entropy was proposed and used to measure the correlation between features. For avoiding calculating complicated combination mutual information, a novel feature selection method based on information loss was proposed. The method assures that the selected feature can increase the most information of the selected feature set. At last, the proposed method was used to analyze 3 kinds of practical classification dataset downloaded from UCI public dataset. Feature selection results are tested with Support Vector Machine and the results were compared with some other feature selection methods. Comparison results show that the proposed method in this paper is more efficient than others.

Key words: information loss, information bottleneck theory, mutual information, feature selection, extended entropy

CLC Number: 

  • TP311
[1] 姚旭, 王晓丹, 张玉玺, 等. 特征选择方法综述[J]. 控制与决策, 2012, 27(2): 161-166. YAO Xu, WANG Xiaodan, ZHANG Yuxi, et al. Summary of feature selection algorithms[J]. Control and Decision, 2012, 27(2):161-166.
[2] LIU Xiaoming, TANG Jinshan. Mass classification in mammograms using selected geometry and texture features, and a new svm-based feature selection method[J]. IEEE Systems Journal, 2014, 8(3):910-920.
[3] WANG De, NIE Feiping, HUANG Heng. Feature selection via global redundancy minimization[J].IEEE Transactions on Knowledge and Data Engineering, 2015, 27(10):2743-2755.
[4] HOU Chengping, NIE Feiping, LI Xuelong, et al. Joint embedding learning and sparse regression: a framework for unsupervised feature selection[J]. IEEE Transactions on Cybernetics, 2014, 44(6):793-804.
[5] STEFANO B, ANDREA E, FABRIZIO S. Feature selection for ordinal text classification[J]. Neural Computation, 2014, 26(3):557-591.
[6] AROQUIARAJ I L, THANGAVEL K. Mammogram image feature selection using unsupervised tolerance rough set relative reduct algorithm[C] //International Conference on Pattern Recognition, Informatics and Mobile Engineering(PRIME). New York: IEEE, 2013:479-484.
[7] SUN Zhanquan, LI Zhao. Data intensive parallel feature selection method study[C] //International Joint Conference on Neural Networks(IJCNN). NewYork: IEEE, 2014: 2256-2262.
[8] 徐峻岭, 周毓明, 陈林, 等. 基于互信息的无监督特征选择[J]. 计算机研究与发展,2012, 49(2):372-382. XU Junling, ZHOU Yuming, CHEN Lin, et al. An unsupervised feature selection approach based on mutual information[J]. Journal of Computer Research and Development, 2012, 49(2):372-382.
[9] GOLDBERGER J, GORDON S, GREENSPAN H. Unsupervised image-set clustering using an information theoretic framework[J]. IEEE Transactions on Image Processing, 2006, 15(2):449-458.
[10] SIMONE C, LUCIO M, CARLO S R. Information bottleneck-based relevant knowledge representation in large-scale video surveillance systems[C] // IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). New York: IEEE, 2014:4364-4368.
[11] CORTES C, VAPNIK V. Support Vector Networks[J]. Machine Learning, 1995, 20(3):273-297.
[12] FAN R E, CHEN P H, LIN C J. Working set selection using second order information for training support vector machines[J]. Journal of Machine Learning Research, 2005, 6(4):1889-1918.
[1] GONG Shuang-shuang, CHEN Yu-feng, XU Jin-an, ZHANG Yu-jie. Extraction of Chinese multiword expressions based on Web text [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 40-48.
[2] HUANG Tian-yi, ZHU William. Cost-sensitive feature selection via manifold learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 91-96.
[3] WAN Zhong-ying, WANG Ming-wen, ZUO Jia-li, WAN Jian-yi. Feature selection combined with the global and local information(GLFS) [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(5): 87-93.
[4] XIA Meng-nan, DU Yong-ping, ZUO Ben-xin. Micro-blog opinion analysis based on syntactic dependency and feature combination [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 22-30.
[5] ZHENG Yan, PANG Lin, BI Hui, LIU Wei, CHENG Gong. Feature selection algorithm based on sentiment topic model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 74-81.
[6] PAN Qing-qing, ZHOU Feng, YU Zheng-tao, GUO Jian-yi, XIAN Yan-tuan. Recognition method of Vietnamese named entity based on#br# conditional random fields [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(1): 76-79.
[7] YU Ran 1,2, LIU Chun-yang3*, JIN Xiao-long 1, WANG Yuan-zhuo 1, CHENG Xue-qi 1. Chinese spam microblog filtering based on the fusion of
multi-angle features
[J]. J4, 2013, 48(11): 53-58.
[8] YI Chao-qun, LI Jian-ping, ZHU Cheng-wen. A kind of feature selection based on classification accuracy of SVM [J]. J4, 2010, 45(7): 119-121.
[9] CHANG Xiao-Li, LI Jin-Bing. A new synthetic registration principle for multimodal images [J]. J4, 2009, 44(9): 35-39.
[10] YANG Yu-Zhen, LIU Pei-Yu, SHU Zhen-Fang, QIU Ye. Research of an improved information gain methodusing distribution information of terms [J]. J4, 2009, 44(11): 48-51.
[11] YUAN Xiao-hang,DU Xiao-yong . iRIPPER: an improved rule-based text categorization algorithm [J]. J4, 2007, 42(11): 66-68 .
[12] YU Jun-ying,WANG Ming-wen,SHENG Jun . Class information feature selection method for text classification [J]. J4, 2006, 41(3): 144-148 .
[13] FU Xue-feng,LIU Qiu-yun,WANG Ming-wen . Rough sets information retrieval model based on multual information [J]. J4, 2006, 41(3): 116-119 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!