JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2019, Vol. 54 ›› Issue (3): 93-101.doi: 10.6040/j.issn.1671-9352.1.2018.051

Previous Articles    

A chunk increment partial least square algorithm

ZENG Xue-qiang1,2, YE Zhen-lin1, ZUO Jia-li2, WAN Zhong-ying2, WU Shui-xiu2   

  1. 1. Information Engineering School, Nanchang University, Nanchang 330031, Jiangxi, China;
    2. School of Computer &
    Information Engineering, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
  • Published:2019-03-19

Abstract: For the data mining of large-scale data, incremental learning is an effective and efficient technique. As an improved partial least square(PLS)method based on incremental learning, incremental partial least square(IPLS)has a competitive dimension reduction performance. However, there is a drawback in this approach that training samples must be learned one by one, which consumes a lot of time on the issue of on-line learning. To overcome this problem, we propose an extension of IPLS called chunk incremental partial least square(CIPLS)in which a chunk of training samples is processed at a time. Comparative experiments on k8 cancer rescue mutants data set and Reuter-21578 text classification corpus show the proposed CIPLS algorithm is much more efficient than IPLS without sacrifice dimension reduction performance.

Key words: incremental learning, partial least square, data chunk, dimension reduction

CLC Number: 

  • TP311
[1] WOLD S. Principal component analysis[J]. Chemometrics & Intelligent Laboratory Systems, 1987, 2(1): 37-52.
[2] LANDAUER T K, FOLTZ P W, LAHAM D. Introduction to latent semanticanalysis[J]. Discourse Processes, 1998, 25(2/3): 259-284.
[3] BOULESTEIX A L. PLS dimension reduction for classification with microarraydata[J]. Statistical Applications in Genetics and Molecular Biology, 2004, 3(1): 1-30.
[4] ZENG X Q, LI G Z, YANG J Y, et al. Dimension reduction with redundant gene elimination for tumor classification[J]. BMC Bioinformatics, 2008, 9(Suppl 6): S8.
[5] YAN J, ZHANG B, LIU N, et al. Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(3):320-333.
[6] 李雪, 蒋树强. 智能交互的物体识别增量学习技术综述[J]. 智能系统学报, 2017, 12(2):140-149. LI Xue, JIANG Shuqiang. Incremental learning and object recognition system based on intelligent HCI: a survey[J]. CAAI Transactions on Intelligent System, 2017, 12(2): 140-149.
[7] 卜范玉, 陈志奎, 张清辰. 支持增量式更新的大数据特征学习模型[J]. 计算机工程与应用, 2015, 51(12):21-26. BU Fanyu, CHEN Zhikui, ZHANG Qingchen. Incremental updating method for big data feature learning[J]. Computer Engineering and Applications, 2015, 51(12): 21-26.
[8] OZAWA S, PANG S, KASABOV N. Online feature extraction for evolving intelligent systems[M] //OZAWA S, PANG S, KASABOV N. eds. Evolving Intelligent Systems. Hoboken: John Wiley & Sons, Inc., 2010: 151-171.
[9] WENG J Y, ZHANG Y L, HWANG W S. Candid covariance-free incremental principal componentanalysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(8): 1034-1040.
[10] ZENG X Q, LI G Z. Dimension reduction for p53 protein recognition by using incremental partial leastsquares[J]. IEEE Transactions on NanoBioscience, 2014, 13(2): 73-79.
[11] HIRAOKA K, HIDAI K, HAMAHIRA M, et al. Successive learning of linear discriminant analysis: sanger-type algorithm[C] //International Conference on Pattern Recognition, 2000. Borcelona: IEEE, 2000:664-667.
[12] PANG S, OZAWA S, KASABOV N. Incremental linear discriminant analysis for classification of datastreams[J]. IEEE Transactions on Systems, Man and Cybernetics: Part b(Cybernetics), 2005, 35(5): 905-914.
[13] OZAWA S, PANG S, KASABOV N. Incremental learning of chunk data for online pattern classification systems[J]. IEEE Transactionson Neural Networks, 2008, 19(6):1061-1074.
[14] 曾雪强, 赵丙娟, 向润,等. 基于偏最小二乘的人脸年龄估计[J]. 南昌大学学报(工科版), 2017, 39(4):380-385. ZENG Xueqiang, ZHAO Bingjuan, XIANG Run, et al. Partial least squares based facial age estimation[J]. Journal of Nanchang University(Engineering & Technology), 2017, 39(4): 380-385.
[15] MARTÍNEZ J L, SAULO H, ESCOBAR H B, et al. A new model selection criterion for partial least squaresregression[J]. Chemometrics and Intelligent Laboratory Systems, 2017, 169: 64-78.
[16] HELLAND I S. On the structure of partial least squaresregression[J]. Communications in Statistics - Simulation and Computation, 1988, 17(2): 581-607.
[17] DE JONG S. SIMPLS: an alternative approach to partial least squaresregression[J]. Chemometrics and Intelligent Laboratory Systems, 1993, 18(3): 251-263.
[18] DANZIGER S A, BARONIO R, HO L, et al. Predicting positive p53 cancer rescue regions using most informative positive(MIP)active learning[J]. PLOS Computational Biology, 2009, 5(9): e1000498.
[19] HTUN P T, KHAINGK T. Important roles of data mining techniques for anomaly intrusion detectionsystem[J]. International Journal of Advanced Research in Computer Engineering & Technology, 2013, 2(5): 1850-1854.
[20] WITTEN I, FRANK E. Datamining: practical machine learning tools and techniques [J]. ACM Sigmod Record, 2005, 31(1):76-77.
[21] YANG Y, LIU X. A re-examination of text categorization methods [C] // Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley: ACM Press, 1999: 42-49.
[1] LIU Xuan1, XU Jie-ping1*, CHEN Jie2. Music similarity research based on the Web tag [J]. J4, 2012, 47(5): 53-58.
[2] SHAO Wei1, ZHU Li-ping2, LIU Fu-Guo2, WANG Qiu-Ping2. Sparse principal component analysis for symmetric matrix and  application in sufficient dimension reduction [J]. J4, 2012, 47(4): 116-120.
[3] ZHU Li-ping1,2, SHAO Wei1. Linear functionals semiparametric dimension reduction inference with missing data [J]. J4, 2011, 46(4): 17-22.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!