您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (1): 1-10, 45.doi: 10.6040/j.issn.1671-9352.0.2023.512

• 特邀综述 •    下一篇

策略极限理论与策略统计学习

严晓东()   

  1. 山东大学中泰证券金融研究院,山东 济南 250100
  • 收稿日期:2023-12-05 出版日期:2024-01-20 发布日期:2024-01-19
  • 作者简介:严晓东(1988—), 男, 副研究员, 博士生导师, 博士, 研究方向为机器学习、计量经济、金融科技和大数据统计分析.E-mail: yanxiaodong@sdu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(12371292);国家统计局统计科学研究资助项目(2022LY080);科技部国家重点研发计划资助项目(2023YFA1008701)

Strategic limit theory and strategic statistical learning

Xiaodong YAN()   

  1. Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan 250100, Shandong, China
  • Received:2023-12-05 Online:2024-01-20 Published:2024-01-19

摘要:

非线性期望是山东大学彭实戈院士开辟的原创性研究方向之一, 对各个领域的科学研究越来越重要, 而大数据和人工智能的兴起, 为非线性期望创新理论与应用研究提供了更强劲的动力。最近, 山东大学“非线性期望”团队基于多臂老虎机的策略博弈过程开创了“策略极限理论”, 是非线性概率理论与强化学习交叉的重大突破性科研成果, 变革了传统统计方法研究范式。本文结合徐宗本院士提出的人工智能的10个重大数理基础问题, 国家自然科学基金委员会发布的2022年度重大研究计划项目中关于可解释、可通用的人工智能方法的申报指南, 以及科技部发布的数学和应用研究重点专项2021、2022年度项目中“数据科学与人工智能的数学基础”理论研究的申报指南, 采用“策略”这一概念探寻和揭示人工智能本质和规律, 尝试启发、促动人工智能技术变革的激发源和理论依据。不同于传统的大数定律和中心极限定理在独立同分布假设下开展统计学习的研究, 策略极限理论打破了数据可交换这一局限, 在更大的概率空间中探求最优分布, 并提出获得最优分布的最优策略路径, 与之对应的统计学习过程被命名为策略统计学习, 为复杂机器学习的可解释和可信赖的统计方法研究提供理论支撑。本文介绍策略极限理论的应用包括但不限于: (1)大规模数据的策略抽样; (2)数据流的在线学习; (3)强化学习的中心极限定理; (4)数据的差分隐私保护; (5)联邦学习的策略融合; (6)迁移学习和元学习的信息重构; (7)知识推理与数据驱动的融合。

关键词: 人工智能, 策略极限理论, 数理基础, 大数据分析, 强化学习, 在线学习, 迁移学习, 联邦学习, 数据隐私保护, 知识推理与数据驱动

Abstract:

The nonlinear expectation is an original research direction pioneered by Academician Peng Shige of Shandong University, which is becoming increasingly important in various fields of scientific research. The rise of big data and artificial intelligence has provided stronger impetus for innovative theoretical and applied research in nonlinear expectation. Recently, Shandong University's Nonlinear Probability Team has developed the "Strategy Limit Theory" based on the strategic game process of multi-armed bandits, representing a significant breakthrough in the intersection of nonlinear probability theory and reinforcement learning. This has tran-sformed the research paradigm of traditional statistical methods. Based on the proposed 10 basic mathematical problems of artificial intelligence by Academician Xu Zongben, the declaration guide of 2022 major research plan projects issued by the National Natural Science Foundation of China for the research about universal and interpretable artificial intelligence technologies, and the application guide for basic mathematical theory research of artificial intelligence in 2021 and 2022 the key projects of "Mathematics and Applied Research" issued by the Ministry of Science and Technology, this article adopts the concept of "strategy" to reveal the nature of artificial intelligence and explore and the motivation source and theoretical basis for initiating and promoting the innovation of artificial intelligence technology. Different from the applications of the traditional law of large numbers and the central limit theorem in the field of artificial intelligence, we propose novel theory about the strategic law of large numbers and the central limit theorem in the new generation of artificial intelligence. The discussed topics in this work include but not limited to: (1) strategic sampling of massive data; (2) online learning of streaming data; (3) the central limit theorem of reinforcement learning; (4) differential privacy protection of data; (5) strategic integration of federal learning; (6) information reconstruction of transfer learning and meta learning; (7) the fusion of knowledge reasoning and data driving.

Key words: artificial intelligence, strategic limit theory, mathematical foundation, big data analysis, reinforcement learning, online learning, transfer learning, federated learning, data privacy protection, knowledge reasoning and data driving

中图分类号: 

  • TP18

图1

尖峰分布f(z)和双正态分布g(z)随着参数k和h变化的概率密度函数图,其中绿色线是标准正态分布概率密度函数图"

图2

3个尖峰分布概率密度函数Xt(蓝色), Yt(绿色)和R10, π*红色)在区间[a, b]的覆盖概率,其中Xt和Yt的期望分别设置为μL=-1, μR=1以及同方差σL=σR=1, 并且考虑3个区间[0, 1], [0.5, 1.5], [1, 2]"

1 徐宗本. 用好大数据须有大智慧: 准确把握、科学应对大数据带来的机遇和挑战[N]. 人民日报, 2016-03-15 (07).
XU Zongben. To make good use of big data requires great wisdom: accurately grasp and scientifically respond to the opportunities and challenges brought by big data[N]. People's Daily, 2016-03-15 (07).
2 徐宗本. 把握新一代信息技术的聚焦点: 数字化、网络化、智能化[N]. 人民日报, 2019-03-01 (09).
XU Zongben. Grasp the focus of the new generation of information technology: digitalization, networking, and intelligence[N]. People's Daily, 2019-03-01 (09).
3 徐宗本, 唐年胜, 程学旗. 数据科学: 它的内涵、方法、意义与发展[M]. 北京: 科学出版社, 2021.
XU Zongben , TANG Niansheng , CHENG Xueqi . Data science: its connotation, method, significance and development[M]. Beijing: Science Publishing, 2021.
4 徐宗本. 人工智能的10个重大数理基础问题[J]. 中国科学: 信息科学, 2021, 51, 1967- 1978.
XU Zongben . Ten fundamental problems for artificial intelligence: mathematical and physical aspects[J]. Science China: Information Sciences, 2021, 51, 1967- 1978.
5 PENG S . Nonlinear expectations and stochastic calculus under uncertainty: with robust CLT and G-Brownian motion[M]. Berlin: Springer Nature, 2019.
6 CHEN Z , EPSTEIN L G . A central limit theorem for sets of probability measures[J]. Stochastic Processes and Their Applications, 2022, 152, 424- 451.
doi: 10.1016/j.spa.2022.07.003
7 CHEN Z, FENG S, ZHANG G. Strategy-driven limit theorems associated bandit problems[EB/OL]. 2022-04-09[2023-12-05]. https://arxiv.org/abs/2204.04442.
8 CHEN Z , EPSTEIN L , ZHANG G . A central limit theorem, loss aversion and multi-armed bandits[J]. Journal of Economic Theory, 2023, 209, 105645.
doi: 10.1016/j.jet.2023.105645
9 CHEN Z , YAN X , ZHANG G . Strategic two-sample test via two-armed bandit process[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2023, 85, 1271- 1298.
doi: 10.1093/jrsssb/qkad061
10 CHEN Z , FENG X , LIU S , et al. Optimal distributions of rewards for a two-armed slot machine[J]. Neurocomputing, 2023, 518, 401- 407.
doi: 10.1016/j.neucom.2022.11.019
11 ZHAO T , CHENG G , LIU H . A partially linear framework for massive heterogeneous data[J]. Annals of Statistics, 2016, 44 (4): 1400- 1437.
12 KLEINER A , TALWALKAR A , SARKAR P , et al. A scalable bootstrap for massive data[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2014, 76 (4): 795- 816.
doi: 10.1111/rssb.12050
13 AI Mingyao , YU Jun , ZHANG Huiming , et al. Optimal subsampling algorithms for big data regressions[J]. Statistica Sinica, Forthcoming, 2021, 31, 749- 772.
14 ETIKAN I , ALKASSIM R , ABUBAKAR S . Comparision of snowball sampling and sequential sampling technique[J]. Biometrics and Biostatistics International Journal, 2016, 3 (1): 55.
15 LIN N , XI R . Aggregated estimating equation estimation[J]. Statistics and its Interface, 2011, 4, 73- 83.
doi: 10.4310/SII.2011.v4.n1.a8
16 SCHIFANO E D , WU J , WANG C , et al. Online updating of statistical inference in the big data setting[J]. Technometrics, 2016, 58 (3): 393- 403.
doi: 10.1080/00401706.2016.1142900
17 CHEN X , LEE J D , TONG X T , et al. Statistical inference for model parameters in stochastic gradient descent[J]. The Annals of Statistics, 2020, 48, 251- 273.
18 ZHU W , CHEN X , WU B . Online covariance matrix estimation in stochastic gradient descent[J]. Journal of the American Statistical Association, 2021, 118 (154): 393- 404.
19 LUO L , SONG P X K . Renewable estimation and incremental inference in generalized linear models with streaming data sets[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2020, 82, 69- 97.
doi: 10.1111/rssb.12352
20 CUI W, JI X, KONG L, et al. Opposite online learning via sequentially integrated stochastic gradient descent estimators[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2023, 37(6): 7270-7278.
21 SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. [S. l. ]: MIT Press, 2018.
22 WILLIAMS R J . Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8 (3): 229- 256.
23 LAI T L , ROBBINS H . Asymptotically efficient adaptive allocation rules[J]. Advances in Applied Mathematics, 1985, 6 (1): 4- 22.
doi: 10.1016/0196-8858(85)90002-8
24 DWORK C. Differential privacy: a survey of results[C]//International Conference on Theory and Applications of Models of Computation. Heidelberg: Springer, 2008: 1-19.
25 方滨兴. 释放数据使用权将成为未来技术发展取向[N/OL]. 中国新闻网, 2022-05-19[2023-12-05], https://news.sciencenet.cn/htmlnews/2022/5/479297.shtm.
26 WASSERMAN L , ZHOU S . A statistical framework for differential privacy[J]. Journal of the American Statistical Association, 2010, 105 (489): 375- 389.
doi: 10.1198/jasa.2009.tm08651
27 DUCHI J C , JORDAN M I , WAINWRIGHT M J . Privacy aware learning[J]. Journal of the ACM, 2014, 61 (6): 1- 57.
28 LI T , SAHU A K , TALWALKAR A , et al. Federated learning: challenges, methods, and future directions[J]. IEEE Signal Processing Magazine, 2020, 37 (3): 50- 60.
doi: 10.1109/MSP.2020.2975749
29 TAN C, SUN F, KONG T, et al. A survey on deep transfer learning[C]//International Conference on Artificial Neural Networks. Cham: Springer, 2018: 270-279.
30 FINN C, XU, K, LEVINE S. Probabilistic model-agnostic meta-learning[C]//NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc., 2018: 9537-9548.
31 VILALTA R , DRISSI Y . A perspective view and survey of meta-learning[J]. Artificial Intelligence Review, 2002, 18 (2): 77- 95.
doi: 10.1023/A:1019956318069
32 张钹. 人工智能进入后深度学习时代[J]. 智能科学与技术学报, 2019, 1 (1): 4- 6.
ZHANG Ba . Artificial intelligence is entering the post deep-learning era[J]. Chinese Journal of Intelligent Science and Technology, 2019, 1 (1): 4- 6.
33 张钹, 朱军, 苏航. 迈向第三代人工智能[J]. 中国科学: 信息科学, 2020, 50, 1281- 1302.
ZHANG Ba , ZHU Jun , SU Hang . Toward the third generation of artificial intelligence[J]. Science China: Information Sciences, 2020, 50, 1281- 1302.
[1] 康海燕,邓婕. 区块链数据隐私保护研究综述[J]. 《山东大学学报(理学版)》, 2021, 56(5): 92-110.
[2] 余传明,冯博琳,田鑫,安璐. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3): 13-23.
[3] 孙世昶,林鸿飞,孟佳娜,刘洪波. 面向序列迁移学习的似然比模型选择方法[J]. 山东大学学报(理学版), 2017, 52(6): 24-31.
[4] 林鸿飞,张冬瑜,杨亮,徐博. 幽默计算及其应用研究[J]. 山东大学学报(理学版), 2016, 51(7): 1-10.
[5] 黄贤立,罗冬梅. 倾向性文本迁移学习中的特征重要性研究[J]. J4, 2010, 45(7): 13-17.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 赵同欣1,刘林德1*,张莉1,潘成臣2,贾兴军1. 紫藤传粉昆虫与花粉多型性研究[J]. 山东大学学报(理学版), 2014, 49(03): 1 -5 .
[2] 马建玲 . 菱体型消色差相位延迟器的光谱特性分析[J]. J4, 2007, 42(7): 27 -29 .
[3] 程智1,2,孙翠芳2,王宁1,杜先能1. 关于Zn的拉回及其性质[J]. J4, 2013, 48(2): 15 -19 .
[4] 张爱平,李刚 . LR拟正规Ehresmann半群[J]. J4, 2006, 41(5): 44 -47 .
[5] 霍玉洪,季全宝. 一类生物细胞系统钙离子振荡行为的同步研究[J]. J4, 2010, 45(6): 105 -110 .
[6] 石长光 . Faddeev模型中的多孤立子解[J]. J4, 2007, 42(7): 38 -40 .
[7] 马继雄,江莉,祁驭矜,向凤宁,夏光敏 . 祁连龙胆愈伤组织和再生植株的生长及其两种药效成分分析[J]. J4, 2006, 41(6): 157 -160 .
[8] 王康 李华. 化学计量学方法用于蛤青注射色谱数据重叠峰的分辨[J]. J4, 2009, 44(11): 16 -20 .
[9] 陈 莉, . 非方广义系统带干扰抑制的奇异LQ次优控制问题[J]. J4, 2006, 41(2): 74 -77 .
[10] 谢涛,左可正. 关于两个幂等算子组合的Drazin逆的若干探讨[J]. J4, 2013, 48(4): 95 -103 .