JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (7): 113-121.doi: 10.6040/j.issn.1671-9352.1.2023.040

• Review • Previous Articles     Next Articles

A prompt learning approach for telecom network fraud case classification

Jie JI1(),Chengjie SUN1,*(),Lili SHAN1,Boyue SHANG2,Lei LIN1   

  1. 1. Faculty of Computing, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
    2. Xiangfang Branch, Harbin Public Security Bureau, Harbin 150000, Heilongjiang, China
  • Received:2023-10-18 Online:2024-07-20 Published:2024-07-15
  • Contact: Chengjie SUN E-mail:jijie@insun.hit.edu.cn;sunchengjie@hit.edu.cn

Abstract:

For the automatic classification technology of telecom fraud cases, a classification system of telecom network fraud based on situational analysis is formulated, the privacy protection method of case text de-identification is realized, and accuracy and F1-score of a classification method of telecom network fraud cases based on prompt learning is proposed. The experimental results show that the method is on average 1 to 2 percentage points higher than the BERT-based classification method on the data set constructed in the paper.

Key words: prompt learning, telecom network fraud, de-identificaiton, case classification

CLC Number: 

  • TP391

Fig.1

Overall structure diagram of case classification"

Table 1

Number range"

类别 范围
手机号 11位连续数字
身份证号 18位连续数字(最后一位可能是罗马数字X)
银行卡号 13~19位连续数字
QQ号 5~11位连续数字
微信号 6~20位字母、数字、下划线和减号的组合,以字母开头

Table 2

Corresponding regular expression of removed information"

去除信息 正则表达式
号码 [\da-zA-Z_]{6, }
出生年月(birth)   (\d{2, 4}年\d{1, 2}月\d{1, 2}日)|(\d{2, 4}-\d{1, 2}-\d{1, 2})|(\d{2, 4}.\d{1, 2}.\d{1, 2})|\d{2, 4}年
出生日期 {birth}出?生)|(出生(日期|年月|于)[: :]?{birth}
电子邮箱 [a-zA-Z0-9]*@(qq|163|gmail)\.com
网址   [hH][tT]{2}[pP][sS]?[: :; ;]?(//|//|//)?[a-zA-Z0-9/.]+[wW]{3}\.)?[a-zA-Z0-9.]+\.(com|COM|vip|VIP|cc|CC|site|SITE|top|TOP)

Fig.2

Fine-tuning training process of pre-trained language model"

Fig.3

Prediction process of pre-trained language model"

Table 3

Definition of case classification system"

类别名称 类别定义 覆盖现有类别
购物消费 花费资金,用以购买商品、服务等 冒充电商物流客服类
冒充军警购物类
网络游戏产品虚假交易类
虚假购物、服务类
业务办理 办理各类业务(无需花费资金) 贷款、代办信用卡类
虚假征信类
婚恋交友 建立一种关系(婚姻、恋爱、朋友) 网络婚恋、交友类(非虚假网络投资理财类)
配合公务 配合行政、司法等人员执行公务 冒充公检法及政府机关类
人际互助 信任某种人际关系 冒充领导、熟人类
投资盈利 以盈利为目的的行为 刷单返利类
虚假网络投资理财类

Table 4

Data distribution"

类别 样本数量 类别 样本数量
刷单返利类 35 459 冒充公检法及政府机关类 4 407
冒充电商物流客服类 13 772 网络游戏产品虚假交易类 2 155
虚假网络投资理财类 11 836 网络婚恋、交友类(非虚假网络投资理财类) 1 654
贷款、代办信用卡类 11 105 冒充军警购物类 1 197
虚假征信类 8 464 网黑案件 1 092
虚假购物、服务类 7 058 总计 102 762
冒充领导、熟人类 4 563

Table 5

Unified category labels"

类别名称 类别标签 类别名称 类别标签
刷单返利类 刷单返利 冒充公检法及政府机关类 政府机关
冒充电商物流客服类 电商物流 冒充领导、熟人类 领导熟人
虚假网络投资理财类 投资理财 网络游戏产品虚假交易类 游戏产品
贷款、代办信用卡类 贷款信用 网络婚恋、交友类(非虚假网络投资理财类) 婚恋交友
虚假征信类 虚假征信 冒充军警购物类 军警购物
虚假购物、服务类 购物服务 网黑案件 网黑案件

Fig.4

Training and prediction process of BERT model combined with prompt technology"

Table 6

Experiment results"

模型 Acc Macro AvgF1 Weighted Avg F1
TextCNN 0.884 7 0.841 1 0.883 9
ERNIE 0.884 8 0.846 0 0.884 3
RoBERTa 0.882 8 0.846 7 0.881 4
BERT(base) 0.884 9 0.850 3 0.883 6
BERT+prompt 0.901 8(+1.69%) 0.876 4(+2.61%) 0.903 8(+2.24%)

Table 7

Comparison of experimental results for different categories (F1-score)"

类别 BERT BERT+prompt
刷单返利类 0.959 1 0.984 4
冒充电商物流客服类 0.793 9 0.731 9
虚假网络投资理财类 0.884 6 0.917 7
贷款、代办信用卡类 0.941 7 0.964 1
虚假征信类 0.810 4 0.930 5
虚假购物、服务类 0.698 0 0.791 8
冒充公检法及政府机关类 0.902 1 0.933 3
冒充领导、熟人类 0.902 5 0.883 5
网络游戏产品虚假交易类 0.913 0 0.978 7
网络婚恋、交友类(非虚假网络投资理财类) 0.636 4 0.583 7
冒充军警购物类 0.790 7 0.834 2
网黑案件 0.971 0 0.983 1
1 刘玲玲, 毕梦瀛, 沈小晓. 多国出台措施打击电信网络诈骗[N]. 人民日报, 2023-01-05(17).
LIU Linlin, BI Mengying, SHEN Xiaoxiao. Multiple countries have introduced measures to combat telecom network fraud[N]. People's Daily, 2023-01-05(17).
2 张维炜.密织反诈"防护网" 压实"守门人"责任: 反电信网络诈骗法正式实施[J].中国人大,2022,(23):33-34.
ZHANGWeiwei.Tightly weaving anti fraud "protective nets" and strengthening the responsibility of "guardians": the anti telecom network fraud law is formally implemented[J].The People's Congress of China,2022,(23):33-34.
3 中华人民共和国反电信网络诈骗法[N]. 人民日报, 2022-11-30(14).
Law of the People's Republic of China on combating telecom and online fraud[N]. People's Daily, 2022-11-30(14).
4 王洁.电信网络诈骗犯罪的独特属性与治理路径[J].中国人民公安大学学报(社会科学版),2019,35(4):1-10.
WANGJie.The unique attribute and governance path of telecommunication and internet fraud[J].Journal of People's Public Security University of China (Social Sciences Edition),2019,35(4):1-10.
5 国家市场监督管理总局, 国家标准化管理委员会. 信息安全技术 个人信息去标识化效果评估指南: GB/T 42460—2023[S/OL]. (2023-03-17)[2023-10-18]. http://c.gb688.cn/bzgk/gb/showGb?type=online&hcno=E1A4E7943D64-346D9EF1E3D0855F8496.
State Administration for Market Regulation, Standardization Adminidtration—Information security technology-guide for evaluating the effectiveness of personal information de-identification: GB/T 424602023[S/OL]. (2023-03-17)[2023-10-18]. http://c.gb688.cn/bzgk/gb/showGb?type=online&hcno=E1A4E7943D64346D9EF1E3D0855F8496.
6 BERG H, DALIANIS H. A semi-supervised approach for de-identification of Swedish clinical text[C]//Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille: European Language Resources Association, 2020: 4444-4450.
7 ANJUM M M, MOHAMMED N, JIANG X Q. De-identification of unstructured clinical texts from sequence to sequence perspective[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. New York: Association for Computing Machinery, 2021: 2438-2440.
8 刘婧茹,宋阳,贾睿,等.基于BiLSTM-CRF中文临床文本中受保护的健康信息识别[J].数据分析与知识发现,2020,4(10):124-133.
LIUJingru,SONGYang,JIARui,et al.A BiLSTM-CRF model for protected health information in Chinese[J].Data Analysis and Knowledge Discovery,2020,4(10):124-133.
9 张云秋,汪洋,李博诚.基于RoBERTa-wwm动态融合模型的中文电子病历命名实体识别[J].数据分析与知识发现,2022,6(Z1):242-250.
ZHANGYunqiu,WANGYang,LIBocheng.Identifying named entities of Chinese electronic medical records based on RoBERTa-wwm dynamic fusion model[J].Data Analysis and Knowledge Discovery,2022,6(Z1):242-250.
10 王得贤,王素格,裴文生,等.基于JCWA-DLSTM的法律文书命名实体识别方法[J].中文信息学报,2020,34(10):51-58.
WANGDexian,WANGSuge,PEIWensheng,et al.Named entity recognition based on JCWA-DLSTM for legal instruments[J].Journal of Chinese Information Processing,2020,34(10):51-58.
11 李春楠,王雷,孙媛媛,等.基于BERT的盗窃罪法律文书命名实体识别方法[J].中文信息学报,2021,35(8):73-81.
LIChunnan,WANGLei,SUNYuanyuan,et al.BERT based named entity recognition for legal texts on theft cases[J].Journal of Chinese Information Processing,2021,35(8):73-81.
12 郭力华,李旸,王素格,等.基于匹配策略和社区注意力机制的法律文书命名实体识别[J].中文信息学报,2022,36(2):85-92.
GUOLihua,LIYang,WANGSuge,et al.Name entity recognition in legal instruments based on matching strategy and community attention mechanism[J].Journal of Chinese Information Processing,2022,36(2):85-92.
13 宋兵. 电信网络诈骗犯罪的刑事立体防治: 以大数据时代为背景[D]. 青岛: 青岛大学, 2019.
SONG Bing. Criminal three-dimensional prevention and control of telecommunications network fraud crimes: against the background of big data era[D]. Qingdao: Qingdao University, 2019.
14 孙高峰. 电信网络诈骗犯罪现状与对策研究[D]. 保定: 河北大学, 2020.
SUN Gaofeng. Research on current situation and countermeasures of telecommunication network fraud crime[D]. Baoding: Hebei University, 2020.
15 葛俊峰. 深圳市电信网络诈骗特征与治理困境研究[D]. 深圳: 深圳大学, 2019.
GE Junfeng. Research on characteristics and governance dilemma of telecom network fraud in Shenzhen[D]. Shenzhen: Shenzhen University, 2019.
16 中国司法大数据研究院. 涉信息网络犯罪特点和趋势(2017.1-2021.12)司法大数据专题报告[R/OL]. (2022-08-01)[2023-10-18]. https://file.chinacourt.org/f.php?id=c9b92b185f359c81&class=enclosure.
China Justice Big Data Institute. Special report on the characteristics and trends of information network crimes (2017.1-2021.12) on judicial big data[R/OL]. (2022-08-01)[2023-10-18]. https://file.chinacourt.org/f.php?id=c9b92b185f35-9c81&class=enclosure.
17 KIM Y. Convolutionalneural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: Association for Computational Linguistics, 2014: 1746-1751.
18 DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: Association for Computational Linguistics, 2019: 4171-4186.
19 LIUPengfei,YUANWeizhe,FUJinlan,et al.Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J].ACM Computing Surveys,2023,55(9):1-35.
20 HANXu,ZHAOWeilin,DINGNing,et al.PTR: prompt tuning with rules for text classification[J].AI Open,2022,3,182-192.
21 GU Yuxian, HAN Xu, LIU Zhiyuan, et al. PPT: pre-trained prompt tuning for few-shot learning[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin: Association for Computational Linguistics, 2022: 8410-8423.
22 HU Shengding, DING Ning, WANG Huadong, et al. Knowledgeable prompt-tuning: incorporating knowledge into prompt verbalizer for text classification[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin: Association for Computational Linguistics, 2022: 2225-2240.
23 CUIYiming,CHEWanxiang,LIUTing,et al.Pre-training with whole word masking for Chinese BERT[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing,2021,29,3504-3514.
[1] Qi LUO,Gang GOU. Multimodal conversation emotion recognition based on clustering and group normalization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 105-112.
[2] Fengxu ZHAO,Jian WANG,Yuan LIN,Hongfei LIN. Probability distribution optimization model for learning to rank [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 95-104.
[3] Xingyu HUANG,Mingyu ZHAO,Ziyu LYU. Category-wise knowledge probers for representation learning of graph neural networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 85-94.
[4] Liang GUI,Yao XU,Shizhu HE,Yuanzhe ZHANG,Kang LIU,Jun ZHAO. Factual error detection in knowledge graphs based on dynamic neighbor selection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 76-84.
[5] Ning XIAN,Yixing FAN,Tao LIAN,Jiafeng GUO. Noise network alignment method integrating multiple features [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 64-75.
[6] Chengjie SUN,Zongwei LI,Lili SHAN,Lei LIN. A document-level event extraction method based on core arguments [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 53-63.
[7] Peiyu LIU,Bowen YAO,Zefeng GAO,Wayne Xin ZHAO. Matrix product operator based sequential recommendation model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 44-52, 104.
[8] Wei SHAO,Gaoyu ZHU,Lei YU,Jiafeng GUO. Dimensionality reduction and retrieval algorithms for high dimensional data [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 27-43.
[9] Jiyuan YANG,Muyang MA,Pengjie REN,Zhumin CHEN,Zhaochun REN,Xin XIN,Fei CAI,Jun MA. Research on self-supervised pre-training for recommender systems [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 1-26.
[10] Haisu CHEN,Jiachun LIAO,Sicheng YAO. Identification and statistical analysis methods of personal information disclosure in open government data [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 95-106.
[11] Xin WEN,Deyu LI. The ML-KNN method based on attribute weighting [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 107-117.
[12] Xueqiang ZENG,Yu SUN,Ye LIU,Zhongying WAN,Jiali ZUO,Mingwen WANG. Emoji embedded representation based on emotion distribution [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 81-94.
[13] Zequn NIU,Xiaoge LI,Chengyu QIANG,Wei HAN,Yi YAO,Yang LIU. Entity disambiguation method based on graph attention networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 71-80, 94.
[14] Chunyu SHI,Yu MAO,Haoyang LIU,Yaojin LIN. Hierarchical feature selection algorithm based on instance correlations [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 61-70.
[15] Chan LU,Junjun GUO,Kaiwen TAN,Yan XIANG,Zhengtao YU. Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 31-40, 51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHANG Shen-gui. Multiplicity of solutions for local superlinear p-kirchhoff-type equation#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 61 -68 .
[2] WU Chun-xue . WNUS property of Musielak-Orlicz sequence spaces[J]. J4, 2007, 42(3): 18 -22 .
[3] DING Mei,FENG Jun-e and WANG Zhi-hong . The minimum time problem of crossing river with restricted conditions[J]. J4, 2007, 42(3): 23 -28 .
[4] WAN Hai-ping,HE Hua-can,ZHOU Yan-quan . Locality preserving kernel method and its application[J]. J4, 2006, 41(3): 18 -20 .
[5] LI Zong-cheng, . Distribution of limit cycles for a class of higherdegree degenerate planar polynomial systems of codimension two[J]. J4, 2007, 42(2): 19 -27 .
[6] . [J]. J4, 2009, 44(4): 92 -96 .
[7] . A note on spaces with property (A)[J]. J4, 2009, 44(6): 22 -24 .
[8] LIU Jing 1,2. Global asymptotic stability condition for a class of delayed cellular neural networks[J]. J4, 2009, 44(4): 61 -65 .
[9] TIAN Hai-long, ZHU Yan-hui, LIANG Tao, MA Jin, LIU Jing. Research on identificating Chinese micro-blog opinion sentence based on three-way decisions[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(08): 58 -65 .
[10] YANG Jun. Characterization and structural control of metalbased nanomaterials[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 1 -22 .