JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2016, Vol. 51 ›› Issue (11): 7-12.doi: 10.6040/j.issn.1671-9352.0.2016.238

Previous Articles     Next Articles

Study on feature selection method based on information loss

LI Zhao1,2.3, SUN Zhan-2,3, LI Xiao,2,3, LI Cheng2.3   

  1. 1. School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China;2. Shandong Computer Science Center(National Supercomputer Center in Jinan), Jinan 250014, Shandong, China;3. Shandong Engineering Technology Research Center of Egovernment Big Data, Jinan 250014, Shandong, China;4. Shandong Provincial Key Laboratory of Computer Networks, Jinan 250014, Shandong, China
  • Received:2016-05-26 Online:2016-11-20 Published:2016-11-22

Abstract: The purpose of this paper is to realize fast feature selection through studying the measure metric between the features and the calculation method of correlation between class variable and selected feature subset. A novel information loss metric based on extended entropy was proposed and used to measure the correlation between features. For avoiding calculating complicated combination mutual information, a novel feature selection method based on information loss was proposed. The method assures that the selected feature can increase the most information of the selected feature set. At last, the proposed method was used to analyze 3 kinds of practical classification dataset downloaded from UCI public dataset. Feature selection results are tested with Support Vector Machine and the results were compared with some other feature selection methods. Comparison results show that the proposed method in this paper is more efficient than others.

Key words: information loss, information bottleneck theory, mutual information, feature selection, extended entropy

CLC Number: 

  • TP311
[1] 姚旭, 王晓丹, 张玉玺, 等. 特征选择方法综述[J]. 控制与决策, 2012, 27(2): 161-166. YAO Xu, WANG Xiaodan, ZHANG Yuxi, et al. Summary of feature selection algorithms[J]. Control and Decision, 2012, 27(2):161-166.
[2] LIU Xiaoming, TANG Jinshan. Mass classification in mammograms using selected geometry and texture features, and a new svm-based feature selection method[J]. IEEE Systems Journal, 2014, 8(3):910-920.
[3] WANG De, NIE Feiping, HUANG Heng. Feature selection via global redundancy minimization[J].IEEE Transactions on Knowledge and Data Engineering, 2015, 27(10):2743-2755.
[4] HOU Chengping, NIE Feiping, LI Xuelong, et al. Joint embedding learning and sparse regression: a framework for unsupervised feature selection[J]. IEEE Transactions on Cybernetics, 2014, 44(6):793-804.
[5] STEFANO B, ANDREA E, FABRIZIO S. Feature selection for ordinal text classification[J]. Neural Computation, 2014, 26(3):557-591.
[6] AROQUIARAJ I L, THANGAVEL K. Mammogram image feature selection using unsupervised tolerance rough set relative reduct algorithm[C] //International Conference on Pattern Recognition, Informatics and Mobile Engineering(PRIME). New York: IEEE, 2013:479-484.
[7] SUN Zhanquan, LI Zhao. Data intensive parallel feature selection method study[C] //International Joint Conference on Neural Networks(IJCNN). NewYork: IEEE, 2014: 2256-2262.
[8] 徐峻岭, 周毓明, 陈林, 等. 基于互信息的无监督特征选择[J]. 计算机研究与发展,2012, 49(2):372-382. XU Junling, ZHOU Yuming, CHEN Lin, et al. An unsupervised feature selection approach based on mutual information[J]. Journal of Computer Research and Development, 2012, 49(2):372-382.
[9] GOLDBERGER J, GORDON S, GREENSPAN H. Unsupervised image-set clustering using an information theoretic framework[J]. IEEE Transactions on Image Processing, 2006, 15(2):449-458.
[10] SIMONE C, LUCIO M, CARLO S R. Information bottleneck-based relevant knowledge representation in large-scale video surveillance systems[C] // IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). New York: IEEE, 2014:4364-4368.
[11] CORTES C, VAPNIK V. Support Vector Networks[J]. Machine Learning, 1995, 20(3):273-297.
[12] FAN R E, CHEN P H, LIN C J. Working set selection using second order information for training support vector machines[J]. Journal of Machine Learning Research, 2005, 6(4):1889-1918.
[1] WU Xiaojun, CHEN Yidan, HAO Yaojun, SONG Changwei, HE Deqing. Multi-label feature selection with label manifold and dynamic graph constraints [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2025, 60(7): 69-83.
[2] WU Xinyao, XU Ji. Hierarchical graph representation learning based on graphical mutual information pooling [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2025, 60(7): 84-93.
[3] CHENG Yuxuan, MAO Yu, ZHANG Xiaoqing, ZENG Yixiang, LIN Yaojin. Online multi-label feature selection based on sub-correlation features and neighborhood mutual information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 70-81.
[4] GAO Hefei, LI Yan, WANG Shuo. Feature selection for partial label learning based on neighborhood rough sets [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 100-113.
[5] ZHU Liquan, LIN Yaojin, MAO Yu, CHENG Yuxuan. Multi-label online stream feature selection based on high-dimensional correlation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 90-99.
[6] Chunyu SHI,Yu MAO,Haoyang LIU,Yaojin LIN. Hierarchical feature selection algorithm based on instance correlations [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 61-70.
[7] WANG Tinghua, HU Zhenwei, ZHAN Hongxiang. A novel unsupervised feature selection method [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(12): 130-140.
[8] ZHANG Zhi-hao, LIN Yao-jin, LU Shun, WU Yi-lin, WANG Chen-xi. Multi-label feature selection with streaming and missing labels [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(8): 39-52.
[9] LI Ying, ZHANG Guo-lin. Modeling for dissolved gases concentration based on mutual information and kernel entropy component analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(7): 43-52.
[10] SUN Lin, CHEN Yu-sheng, XU Jiu-cheng. Multilabel feature selection algorithm based on improved ReliefF [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(4): 1-11.
[11] SUN Lin, LIANG Na, XU Jiu-cheng. Feature selection using adaptive neighborhood mutual information and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 13-24.
[12] ZHANG Yao, MA Ying-cang, YAND Xiao-fei, ZHU Heng-dong, YANG Ting. Multi-label feature selection based on manifold structure and flexible embedding [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(7): 91-102.
[13] LI Wan-li, TANG Jing-yao, XUE Yun, HU Xiao-hui, ZHANG Tao. A global word vector model based on pointwise mutual information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 100-105.
[14] GONG Shuang-shuang, CHEN Yu-feng, XU Jin-an, ZHANG Yu-jie. Extraction of Chinese multiword expressions based on Web text [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 40-48.
[15] HUANG Tian-yi, ZHU William. Cost-sensitive feature selection via manifold learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 91-96.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!