《山东大学学报(理学版)》 ›› 2023, Vol. 58 ›› Issue (1): 67-75.doi: 10.6040/j.issn.1671-9352.2.2021.139
• • 上一篇
梁云1,门昌骞1,王文剑2*
LIANG Yun1, MEN Chang-qian1, WANG Wen-jian2*
摘要: AdaBoost算法是一种将多个基学习器通过合理策略结合生成强学习器的集成算法,其性能取决于基学习器的准确率和多样性。但弱学习器分类精度不高往往也导致了最终强分类器性能较差,因此进一步为了提高算法的分类精确率,本文提出一种MDTAda模型,首先利用基尼指数迭代构造一棵不完全决策树,然后在决策树的非纯伪叶结点上添加简单分类器,生成MDT(模型决策树),将MDT作为AdaBoost算法的基分类器,加权平均生成强分类器。在标准数据集上的实验表明,相比传统的AdaBoost算法,本文提出的算法拥有更好的泛化性能和更优的间隔分布,且在与AdaBoost算法达到相同精度时所需迭代次数更少。
中图分类号:
[1] HAN J W, PEI J, KAMBER M. Data mining, concepts and techniques[M]. Amsterdam: Elsevier, 2011. [2] QUINLAN J R. C4.5:programs for machine learning[M]. San Francisco: Morgan Kaufmann Publishers, 1993. [3] FRIEDMAN J H, OLSHEN R A, STONE C J, et al. Classification and regression trees[M]. Florida: CRC Press, 1984. [4] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[C] //Proceedings of the Second European Conference on Computational Learning Theory. Berlin:Springer-Verlag, 1995: 23-37. [5] FREIEDMAN J H. Greedy function approximation: a gradientboosting machine[J]. The Annals of Statistics, 2001, 29(5): 1189-1232. [6] CHEN T Q, GUESTRIN C. Xgboost: a scalable tree boosting system[C] //Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2016: 785-794. [7] KE G L, MENG Q, FINLEY T, et al.LightGBM: a highly efficient gradient boosting decision tree[C] //31st Conference on Neural Information Processing Systems. Long Beach:NIPS, 2017:3146-3154. [8] ZHOU Z H, JI F. Deep forest: towards an alternative to deep neural networks[C] //Twenty-sixth International Joint Conference on Artificial Intelligence. Sweden: IJCAI, 2017: 3553-3559. [9] JIN X B, HOU X W, LIU C L. Multi-class AdaBoost with hypothesis margin[C] //2010 20th International Conference on Pattern Recognition. Piscataway: IEEE, 2010: 65-68. [10] CAO J J, KWONG S, WANG R. A noise-detection based AdaBoost algorithm for mislabeled data[J]. Pattern Recognition, 2012, 45(12): 4451-4465. [11] YAO X, WANG X D, ZHANG Y X, et al. A self-adaption ensemble algorithm based on random subspace and AdaBoost[J].Acta Electronica Sinica, 2013, 41(4): 810-814. [12] SUN B, CHEN S, WANG J D, et al. A robust multi-class AdaBoost algorithm for mislabeled noisy data[J]. Knowledge-based Systems, 2016, 102: 87-102. [13] WU R L, WANG L M, HU T Y. AdaBoost-SVM for electrical theft detection and GRNN for stealing time periods identification[C] //IECON 2018: 44th Annual Conference of the IEEE Industrial Electronics Society. Piscataway: IEEE, 2018: 3073-3078. [14] SUN J, LI HUA, FUJITA H, et al. Class-imbalanced dynamic financial distress prediction based on AdaBoost-SVM ensemble combined with SMOTE and time weighting[J]. Information Fusion, 2020, 54: 128-144. [15] LI K W, XIE P, ZHAI J N, et al. An improved AdaBoost algorithm for imbalanced data based on weighted KNN[C] //2017 IEEE 2nd International Conference on Big Data Analysis(ICBDA). Piscataway: IEEE, 2017: 30-34. [16] YANG S, CHEN L F, YAN T, et al. An ensemble classification algorithm for convolutional neural network based on AdaBoost[C] //2017 IEEE/ACIS 16th International Conference on Computer and Information Science(ICIS). Piscataway: IEEE, 2017: 401-406. [17] PUTRA M A, SETIAWAN N A, WIBIRAMA S, et al. Wart treatment method selection using AdaBoost with random forests as a weak learner[J]. Communications in Science and Technology, 2018, 3(2): 52-56. [18] 尹儒,门昌骞,王文剑. 模型决策树:一种决策树加速算法[J]. 模式识别与人工智能, 2018, 31(7): 643-652. YIN Ru, MEN Changqian, WANG Wenjian. Model decision tree:an accelerated algorithm of decision tree[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(7):643-652. [19] SCHAPIRE R E, FREUND Y, BARTLETT P, et al. Boosting the margin: a new explanation for the effectiveness of voting methods[J]. The Annals of Statistics, 1998, 26(5):1651-1686. [20] BREIMAN L. Prediction games and arcing algorithms[J]. Neural Computation, 1999, 11(7):1493-1517. [21] GAO W, ZHOU Z H. On the doubt about margin explanation of boosting[J]. Artificial Intelligence, 2013, 203(10):1-18. |
[1] | 廖祥文,徐阳,魏晶晶,杨定达,陈国龙. 基于双层堆叠分类模型的水军评论检测[J]. 《山东大学学报(理学版)》, 2019, 54(7): 57-67. |
[2] | 马丽菲,莫倩,杜辉. 面向中文短影评的分类技术研究[J]. 山东大学学报(理学版), 2016, 51(1): 52-57. |
[3] | 张雯,张化祥*,李明方,计华. 决策树构建方法:向前两步优于一步[J]. J4, 2010, 45(7): 114-118. |
[4] | 陈雷. 关于通讯过程决策树的几点附注[J]. J4, 2009, 44(1): 33-39 . |
[5] | 陈 雷 . 准同步通讯与泛同步通讯(Ⅱ)[J]. J4, 2008, 43(5): 32-38 . |
[6] | 何爱香,张 勇 . 基于遗传算法和决策树的肿瘤分类规则挖掘[J]. J4, 2007, 42(9): 91-95 . |
[7] | 亓呈明,郝 玲,崔守梅 . 一种新的模糊决策树模型及其应用[J]. J4, 2007, 42(11): 107-109 . |
[8] | 张华伟,王明文,甘丽新 . 基于随机森林的文本分类模型研究[J]. J4, 2006, 41(3): 139-143 . |
|