《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (7): 113-121.doi: 10.6040/j.issn.1671-9352.1.2023.040
Jie JI1(),Chengjie SUN1,*(),Lili SHAN1,Boyue SHANG2,Lei LIN1
摘要:
针对电信诈骗案件自动分类技术进行研究,制定基于情境分析的电信网络诈骗分类体系,实现案件文本去标识化隐私保护方法,提出一种基于提示学习的电信网络诈骗案件分类方法,实验结果显示该方法在本文构建的数据集上准确率、F1值等指标均高于基于BERT的分类方法1%~2%。
中图分类号:
1 | 刘玲玲, 毕梦瀛, 沈小晓. 多国出台措施打击电信网络诈骗[N]. 人民日报, 2023-01-05(17). |
LIU Linlin, BI Mengying, SHEN Xiaoxiao. Multiple countries have introduced measures to combat telecom network fraud[N]. People's Daily, 2023-01-05(17). | |
2 | 张维炜.密织反诈"防护网" 压实"守门人"责任: 反电信网络诈骗法正式实施[J].中国人大,2022,(23):33-34. |
ZHANGWeiwei.Tightly weaving anti fraud "protective nets" and strengthening the responsibility of "guardians": the anti telecom network fraud law is formally implemented[J].The People's Congress of China,2022,(23):33-34. | |
3 | 中华人民共和国反电信网络诈骗法[N]. 人民日报, 2022-11-30(14). |
Law of the People's Republic of China on combating telecom and online fraud[N]. People's Daily, 2022-11-30(14). | |
4 | 王洁.电信网络诈骗犯罪的独特属性与治理路径[J].中国人民公安大学学报(社会科学版),2019,35(4):1-10. |
WANGJie.The unique attribute and governance path of telecommunication and internet fraud[J].Journal of People's Public Security University of China (Social Sciences Edition),2019,35(4):1-10. | |
5 | 国家市场监督管理总局, 国家标准化管理委员会. 信息安全技术 个人信息去标识化效果评估指南: GB/T 42460—2023[S/OL]. (2023-03-17)[2023-10-18]. http://c.gb688.cn/bzgk/gb/showGb?type=online&hcno=E1A4E7943D64-346D9EF1E3D0855F8496. |
State Administration for Market Regulation, Standardization Adminidtration—Information security technology-guide for evaluating the effectiveness of personal information de-identification: GB/T 424602023[S/OL]. (2023-03-17)[2023-10-18]. http://c.gb688.cn/bzgk/gb/showGb?type=online&hcno=E1A4E7943D64346D9EF1E3D0855F8496. | |
6 | BERG H, DALIANIS H. A semi-supervised approach for de-identification of Swedish clinical text[C]//Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille: European Language Resources Association, 2020: 4444-4450. |
7 | ANJUM M M, MOHAMMED N, JIANG X Q. De-identification of unstructured clinical texts from sequence to sequence perspective[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. New York: Association for Computing Machinery, 2021: 2438-2440. |
8 | 刘婧茹,宋阳,贾睿,等.基于BiLSTM-CRF中文临床文本中受保护的健康信息识别[J].数据分析与知识发现,2020,4(10):124-133. |
LIUJingru,SONGYang,JIARui,et al.A BiLSTM-CRF model for protected health information in Chinese[J].Data Analysis and Knowledge Discovery,2020,4(10):124-133. | |
9 | 张云秋,汪洋,李博诚.基于RoBERTa-wwm动态融合模型的中文电子病历命名实体识别[J].数据分析与知识发现,2022,6(Z1):242-250. |
ZHANGYunqiu,WANGYang,LIBocheng.Identifying named entities of Chinese electronic medical records based on RoBERTa-wwm dynamic fusion model[J].Data Analysis and Knowledge Discovery,2022,6(Z1):242-250. | |
10 | 王得贤,王素格,裴文生,等.基于JCWA-DLSTM的法律文书命名实体识别方法[J].中文信息学报,2020,34(10):51-58. |
WANGDexian,WANGSuge,PEIWensheng,et al.Named entity recognition based on JCWA-DLSTM for legal instruments[J].Journal of Chinese Information Processing,2020,34(10):51-58. | |
11 | 李春楠,王雷,孙媛媛,等.基于BERT的盗窃罪法律文书命名实体识别方法[J].中文信息学报,2021,35(8):73-81. |
LIChunnan,WANGLei,SUNYuanyuan,et al.BERT based named entity recognition for legal texts on theft cases[J].Journal of Chinese Information Processing,2021,35(8):73-81. | |
12 | 郭力华,李旸,王素格,等.基于匹配策略和社区注意力机制的法律文书命名实体识别[J].中文信息学报,2022,36(2):85-92. |
GUOLihua,LIYang,WANGSuge,et al.Name entity recognition in legal instruments based on matching strategy and community attention mechanism[J].Journal of Chinese Information Processing,2022,36(2):85-92. | |
13 | 宋兵. 电信网络诈骗犯罪的刑事立体防治: 以大数据时代为背景[D]. 青岛: 青岛大学, 2019. |
SONG Bing. Criminal three-dimensional prevention and control of telecommunications network fraud crimes: against the background of big data era[D]. Qingdao: Qingdao University, 2019. | |
14 | 孙高峰. 电信网络诈骗犯罪现状与对策研究[D]. 保定: 河北大学, 2020. |
SUN Gaofeng. Research on current situation and countermeasures of telecommunication network fraud crime[D]. Baoding: Hebei University, 2020. | |
15 | 葛俊峰. 深圳市电信网络诈骗特征与治理困境研究[D]. 深圳: 深圳大学, 2019. |
GE Junfeng. Research on characteristics and governance dilemma of telecom network fraud in Shenzhen[D]. Shenzhen: Shenzhen University, 2019. | |
16 | 中国司法大数据研究院. 涉信息网络犯罪特点和趋势(2017.1-2021.12)司法大数据专题报告[R/OL]. (2022-08-01)[2023-10-18]. https://file.chinacourt.org/f.php?id=c9b92b185f359c81&class=enclosure. |
China Justice Big Data Institute. Special report on the characteristics and trends of information network crimes (2017.1-2021.12) on judicial big data[R/OL]. (2022-08-01)[2023-10-18]. https://file.chinacourt.org/f.php?id=c9b92b185f35-9c81&class=enclosure. | |
17 | KIM Y. Convolutionalneural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: Association for Computational Linguistics, 2014: 1746-1751. |
18 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: Association for Computational Linguistics, 2019: 4171-4186. |
19 | LIUPengfei,YUANWeizhe,FUJinlan,et al.Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J].ACM Computing Surveys,2023,55(9):1-35. |
20 | HANXu,ZHAOWeilin,DINGNing,et al.PTR: prompt tuning with rules for text classification[J].AI Open,2022,3,182-192. |
21 | GU Yuxian, HAN Xu, LIU Zhiyuan, et al. PPT: pre-trained prompt tuning for few-shot learning[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin: Association for Computational Linguistics, 2022: 8410-8423. |
22 | HU Shengding, DING Ning, WANG Huadong, et al. Knowledgeable prompt-tuning: incorporating knowledge into prompt verbalizer for text classification[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin: Association for Computational Linguistics, 2022: 2225-2240. |
23 | CUIYiming,CHEWanxiang,LIUTing,et al.Pre-training with whole word masking for Chinese BERT[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing,2021,29,3504-3514. |
[1] | 罗奇,苟刚. 基于聚类和群组归一化的多模态对话情绪识别[J]. 《山东大学学报(理学版)》, 2024, 59(7): 105-112. |
[2] | 赵峰叙,王健,林原,林鸿飞. 面向排序学习的概率分布优化模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 95-104. |
[3] | 黄兴宇,赵明宇,吕子钰. 面向图神经网络表征学习的类别知识探针[J]. 《山东大学学报(理学版)》, 2024, 59(7): 85-94. |
[4] | 桂梁,徐遥,何世柱,张元哲,刘康,赵军. 基于动态邻居选择的知识图谱事实错误检测方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 76-84. |
[5] | 咸宁,范意兴,廉涛,郭嘉丰. 融合多重特征的噪声网络对齐方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 64-75. |
[6] | 孙承杰,李宗蔚,单丽莉,林磊. 一种基于核心论元的篇章级事件抽取方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 53-63. |
[7] | 刘沛羽,姚博文,高泽峰,赵鑫. 基于矩阵乘积算符表示的序列化推荐模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 44-52, 104. |
[8] | 邵伟,朱高宇,于雷,郭嘉丰. 高维数据的降维与检索算法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 27-43. |
[9] | 杨纪元,马沐阳,任鹏杰,陈竹敏,任昭春,辛鑫,蔡飞,马军. 基于自监督的预训练在推荐系统中的研究[J]. 《山东大学学报(理学版)》, 2024, 59(7): 1-26. |
[10] | 陈海粟,廖佳纯,姚思诚. 政府开放数据中个人信息披露识别与统计方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 95-106. |
[11] | 温欣,李德玉. 基于属性加权的ML-KNN方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 107-117. |
[12] | 曾雪强,孙雨,刘烨,万中英,左家莉,王明文. 基于情感分布的emoji嵌入式表示[J]. 《山东大学学报(理学版)》, 2024, 59(3): 81-94. |
[13] | 牛泽群,李晓戈,强成宇,韩伟,姚怡,刘洋. 基于图注意力神经网络的实体消歧方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 71-80, 94. |
[14] | 史春雨,毛煜,刘浩阳,林耀进. 基于样本相关性的层次特征选择算法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 61-70. |
[15] | 卢婵,郭军军,谭凯文,相艳,余正涛. 基于文本指导的层级自适应融合的多模态情感分析[J]. 《山东大学学报(理学版)》, 2023, 58(12): 31-40, 51. |
|