《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (3): 93-101.doi: 10.6040/j.issn.1671-9352.1.2018.051
• • 上一篇
曾雪强1,2,叶震麟1,左家莉2,万中英2,吴水秀2
ZENG Xue-qiang1,2, YE Zhen-lin1, ZUO Jia-li2, WAN Zhong-ying2, WU Shui-xiu2
摘要: 增量学习模型是一种有效挖掘大规模数据的数据处理技术。增量式偏最小二乘(incremental partial least square, IPLS)模型是一种基于增量技术的偏最小二乘算法改进模型,具有不错的数据降维效果,但是,IPLS模型每新增1个样本都需要对模型进行增量更新,导致模型的训练时间较长。针对这一问题,基于数据分块更新的思想提出了一种块增量偏最小二乘算法(chunk incremental partial least square, CIPLS)。CIPLS算法将样本数据划分为数个的数据块(chunk),然后再以数据块为单位对模型进行增量更新,从而大幅减少了模型的更新频率,提高了模型的学习效率。在K8版本的p53蛋白数据集和路透文本分类语料库上的对比实验表明,CIPLS算法大幅度缩短了增量式偏最小二乘模型的训练时间。
中图分类号:
[1] WOLD S. Principal component analysis[J]. Chemometrics & Intelligent Laboratory Systems, 1987, 2(1): 37-52. [2] LANDAUER T K, FOLTZ P W, LAHAM D. Introduction to latent semanticanalysis[J]. Discourse Processes, 1998, 25(2/3): 259-284. [3] BOULESTEIX A L. PLS dimension reduction for classification with microarraydata[J]. Statistical Applications in Genetics and Molecular Biology, 2004, 3(1): 1-30. [4] ZENG X Q, LI G Z, YANG J Y, et al. Dimension reduction with redundant gene elimination for tumor classification[J]. BMC Bioinformatics, 2008, 9(Suppl 6): S8. [5] YAN J, ZHANG B, LIU N, et al. Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(3):320-333. [6] 李雪, 蒋树强. 智能交互的物体识别增量学习技术综述[J]. 智能系统学报, 2017, 12(2):140-149. LI Xue, JIANG Shuqiang. Incremental learning and object recognition system based on intelligent HCI: a survey[J]. CAAI Transactions on Intelligent System, 2017, 12(2): 140-149. [7] 卜范玉, 陈志奎, 张清辰. 支持增量式更新的大数据特征学习模型[J]. 计算机工程与应用, 2015, 51(12):21-26. BU Fanyu, CHEN Zhikui, ZHANG Qingchen. Incremental updating method for big data feature learning[J]. Computer Engineering and Applications, 2015, 51(12): 21-26. [8] OZAWA S, PANG S, KASABOV N. Online feature extraction for evolving intelligent systems[M] //OZAWA S, PANG S, KASABOV N. eds. Evolving Intelligent Systems. Hoboken: John Wiley & Sons, Inc., 2010: 151-171. [9] WENG J Y, ZHANG Y L, HWANG W S. Candid covariance-free incremental principal componentanalysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(8): 1034-1040. [10] ZENG X Q, LI G Z. Dimension reduction for p53 protein recognition by using incremental partial leastsquares[J]. IEEE Transactions on NanoBioscience, 2014, 13(2): 73-79. [11] HIRAOKA K, HIDAI K, HAMAHIRA M, et al. Successive learning of linear discriminant analysis: sanger-type algorithm[C] //International Conference on Pattern Recognition, 2000. Borcelona: IEEE, 2000:664-667. [12] PANG S, OZAWA S, KASABOV N. Incremental linear discriminant analysis for classification of datastreams[J]. IEEE Transactions on Systems, Man and Cybernetics: Part b(Cybernetics), 2005, 35(5): 905-914. [13] OZAWA S, PANG S, KASABOV N. Incremental learning of chunk data for online pattern classification systems[J]. IEEE Transactionson Neural Networks, 2008, 19(6):1061-1074. [14] 曾雪强, 赵丙娟, 向润,等. 基于偏最小二乘的人脸年龄估计[J]. 南昌大学学报(工科版), 2017, 39(4):380-385. ZENG Xueqiang, ZHAO Bingjuan, XIANG Run, et al. Partial least squares based facial age estimation[J]. Journal of Nanchang University(Engineering & Technology), 2017, 39(4): 380-385. [15] MARTÍNEZ J L, SAULO H, ESCOBAR H B, et al. A new model selection criterion for partial least squaresregression[J]. Chemometrics and Intelligent Laboratory Systems, 2017, 169: 64-78. [16] HELLAND I S. On the structure of partial least squaresregression[J]. Communications in Statistics - Simulation and Computation, 1988, 17(2): 581-607. [17] DE JONG S. SIMPLS: an alternative approach to partial least squaresregression[J]. Chemometrics and Intelligent Laboratory Systems, 1993, 18(3): 251-263. [18] DANZIGER S A, BARONIO R, HO L, et al. Predicting positive p53 cancer rescue regions using most informative positive(MIP)active learning[J]. PLOS Computational Biology, 2009, 5(9): e1000498. [19] HTUN P T, KHAINGK T. Important roles of data mining techniques for anomaly intrusion detectionsystem[J]. International Journal of Advanced Research in Computer Engineering & Technology, 2013, 2(5): 1850-1854. [20] WITTEN I, FRANK E. Datamining: practical machine learning tools and techniques [J]. ACM Sigmod Record, 2005, 31(1):76-77. [21] YANG Y, LIU X. A re-examination of text categorization methods [C] // Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley: ACM Press, 1999: 42-49. |
[1] | 晏燕,郝晓弘. 差分隐私密度自适应网格划分发布方法[J]. 山东大学学报(理学版), 2018, 53(9): 12-22. |
[2] | 随云仙,刘勇. 基于二步邻居拓扑的E-Burt结构洞检测算法[J]. 山东大学学报(理学版), 2017, 52(9): 59-68. |
[3] | 张中军,张文娟,于来行,李润川. 基于网络距离和内容相似度的微博社交网络社区划分方法[J]. 山东大学学报(理学版), 2017, 52(7): 97-103. |
[4] | 毕晓迪,梁英,史红周,田辉. 一种基于隐私偏好的二次匿名位置隐私保护方法[J]. 山东大学学报(理学版), 2017, 52(5): 75-84. |
[5] | 董红斌,苟乃康,杨雪. 基于兴趣度的广告拍卖模型研究[J]. 山东大学学报(理学版), 2017, 52(3): 1-7. |
[6] | 陈晓云,廖梦真,陈慧娟. 模式收缩最小二乘回归子空间分割[J]. 山东大学学报(理学版), 2016, 51(12): 108-115. |
[7] | 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12. |
[8] | 刘大福,苏旸. 一种基于证据的软件可信性度量模型[J]. 山东大学学报(理学版), 2016, 51(11): 58-65. |
[9] | 高元照,李炳龙,吴熙曦. 基于物理内存的注册表逆向重建取证分析算法[J]. 山东大学学报(理学版), 2016, 51(9): 127-136. |
[10] | 翟鹏,李登道. 基于高斯隶属度的包容性指标模糊聚类算法[J]. 山东大学学报(理学版), 2016, 51(5): 102-105. |
[11] | 邓松. 面向旅游人文信息集成的Web数据源选择[J]. 山东大学学报(理学版), 2016, 51(3): 70-76. |
[12] | 李瑞霞, 刘仁金, 周先存. 基于哈希表的MapReduce算法优化[J]. 山东大学学报(理学版), 2015, 50(07): 66-70. |
[13] | 吴熙曦, 李炳龙, 张天琪. 基于KNN的Android智能手机微信取证方法[J]. 山东大学学报(理学版), 2014, 49(09): 150-153. |
[14] | 卢琦蓓1,2,郭飞鹏3. 基于改进型FP-Tree的分布式关联分类算法[J]. 山东大学学报(理学版), 2014, 49(1): 71-75. |
[15] | 戚丽丽,孙静宇*,陈俊杰. 基于均模型的IBCF算法研究[J]. J4, 2013, 48(11): 105-110. |
|