您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2016, Vol. 51 ›› Issue (7): 66-73.doi: 10.6040/j.issn.1671-9352.1.2015.007

• • 上一篇    下一篇

网购客服对话标注与分析

侯永帅,王晓龙,陈俊杰,周小强,徐军,陈清财   

  1. 哈尔滨工业大学深圳研究生院智能计算研究中心, 广东 深圳 518055
  • 收稿日期:2015-11-14 出版日期:2016-07-20 发布日期:2016-07-27
  • 作者简介:侯永帅(1981— ),男,博士研究生,研究方向为自动问答、信息检索. E-mail: houyongshuai@hitsz.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(61272383,61173075);深圳市战略性新兴产业发展专项(JCYJ20120613151940045)

Online shopping customer service dialogue annotation and analysis

  1. Intelligent Computing Research Center, Shenzhen Graduate School of Harbin Institute of Technology, Shenzhen 518055, Guangdong, China
  • Received:2015-11-14 Online:2016-07-20 Published:2016-07-27

摘要: 在当前交互式问答的研究中,面向真实应用环境的交互式问答语料比较缺乏。首先收集大量网购客服对话日志作为交互式问答研究的语料数据, 对网购对话日志进行统计分析,然后从对话日志中抽取174组会话,对会话中的非规范语言现象、问句相关现象、问句答案匹配现象等交互式语言现象进行了标注和统计。基于标注统计结果发现:高频语句在网购对话中占较大比例,15%的语句的使用量占客服应答语句总量的45%以上;非规范语言现象出现比例占到会话语句的50%;问句相关现象中指代相关、省略相关、公共词序列相关是最重要的3个相关特征;问句答案匹配现象中交叉匹配的情形占到会话的60%以上;匹配的问答对中问句与答案具有显性匹配特征的占50%以上。

关键词: 语料分析, 语料标注, 交互式问答, 客服对话

Abstract: There is lack of research data on real application environment for interactive question-answering research. This paper collected a large number of online shopping customer service dialogue records as real application environment interactive question-answering corpus. First, the online customer service dialogue records were statistics and analysis. Then 174 groups service dialogues were randomly selected. Those dialogues were annotated and statistics on unnormal language, question relevance and question answer matching phenomena. The annotation and statistics results show that: high frequent dialogue sentences reached to large proportion, 15% of high frequent customer dialogue sentences covered 45% of all data customer sent out; 50% of dialogue sentences contained unnormal language phenomena; Anaphora relevance, omission relevance and common word sequences are the three most important features for judging relevance of client questions; more than 60% of service dialogue sentences are cross matching question answers pairs, and more than 50% of matching question answers pairs are recessive matching.

Key words: corpus annotation, interactive question answering, corpus analysis, customer service dialogue

中图分类号: 

  • TP391
[1] MOLLÁ D, VICEDO J L. Question answering in restricted domains: an overview[J]. Computational Linguistics, 2007, 33(1):41-61.
[2] 延霞,范士喜. 面向问答社区的粗粒度问句分类算法[J]. 计算机应用与软件, 2013, 30(1):219-286. YAN Xia, FAN Shixi. Coarse grain question classification method for question answering community[J]. Computer Applications and Software, 2013, 30(1):219-286.
[3] 孔维泽,刘奕群,张敏,等. 问答社区中回答质量的评价方法研究[J]. 中文信息学报, 2011, 25(001): 3-8. KONG Weize, LIU Yiqun, ZHANG Min, et al. Research on the evaluation method of quality in the question and answer community[J]. Chinese Journal of Information, 2011, 25(001):3-8.
[4] 李晨,巢文涵,陈小明,等. 中文社区问答中问题答案质量评价和预测[J]. 计算机科学, 2011, 38(6):230-236. LI Chen, CHAO Wenhan, CHEN Xiaoming, et al. Quality evaluation and prediction of question answers in Chinese community question answering[J]. Computer Science, 2011, 38(6):230-236.
[5] 王宝勋,刘秉权,孙承杰,等. 基于论坛话题段落划分的答案识别[J]. 自动化学报, 2013, 39(1):11-20. WANG Baoxun, LIU Bingquan, SUN Chengjie, et al. Answer recognition based on forum topic paragraph[J]. Journal of Automation, 2013, 39(1):11-20.
[6] WANG B, WANG X, SUN C, et al. Modeling semantic relevance for question-answer pairs in web social communities[C] //Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala: ACL, 2010:1230-1238.
[7] UTHUS D C, AHA D W. Detecting bot-answerable questions in Ubuntu chat[C] //International Joint Conference on Natural Language Processing. Japan: Nagoya, 2013:747-752.
[8] DANG H T, LIN J, KELLY D. Overview of the TREC 2007 question answering track[C] //Proceedings of the 16th Text Retrieval Conference(TREC 2007). [S.l.] : DBLP, 2007:115-123.
[9] KATO T, FUKUMOTO J, MASUI F, et al. Handling information access dialogue through QA technologies—a novel challenge for open-domain question answering[C] //Proceedings of the HLT-NAACL 2004 Workshop on Pragmatics of Question Answering. USA: Boston, 2004:70-77.
[10] BERTOMEU N, USZKOREIT H, FRANK A, et al. Contextual phenomena and thematic relations in database QA dialogues: results from a Wizard-of-Oz Experiment[C] //Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006. New York: ACM, 2006:1-8.
[11] SCHOOTEN B, AKKER R. Follow-up utterances in QA dialogue[J]. Traitement Automatique Des Langues, 2005, 46(3):181-206.
[12] 伍大勇,张宇,刘挺. 中文交互式问答用户问题相关检测研究[J]. 中文信息学报, 2010, 24(3):11-18. WU Dayong, ZHANG Yu, LIU Ting. Research on the detection of Chinese interactive question answering user questions [J]. Chinese Journal of Information, 2010, 24(3):11-18.
[13] 张耀允,王晓龙,王轩,等. 面向开放的限定领域的交互式问答语料分析[J]. 中国计算语言学研究前沿进展(2009-2011).北京:中国中文信息学会, 2011:480-486. ZHANG Yaoyun, WANG Xiaolong, WANG Xuan, et al. An analysis of interactive question and answer corpus in the field of open and restricted domain[J]. Frontiers of Computational Linguistics in China(2009-2011). Beijing: Chinese Information Processing Society of China, 2011:480-486.
[14] YANG F, FENG J, FABBRIZIO G D. A data driven approach to relevancy recognition for contextual question answering[C] //Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006. New York: ACM, 2006:33-40.
[1] 龚双双,陈钰枫,徐金安,张玉洁. 基于网络文本的汉语多词表达抽取方法[J]. 山东大学学报(理学版), 2018, 53(9): 40-48.
[2] 余传明,左宇恒,郭亚静,安璐. 基于复合主题演化模型的作者研究兴趣动态发现[J]. 山东大学学报(理学版), 2018, 53(9): 23-34.
[3] 严倩,王礼敏,李寿山,周国栋. 结合新闻和评论文本的读者情绪分类方法[J]. 山东大学学报(理学版), 2018, 53(9): 35-39.
[4] 原伟,唐亮,易绵竹. 基于本体的俄文新闻话题检测设计与实现[J]. 山东大学学报(理学版), 2018, 53(9): 49-54.
[5] 廖祥文,张凌鹰,魏晶晶,桂林,程学旗,陈国龙. 融合时间特征的社交媒介用户影响力分析[J]. 山东大学学报(理学版), 2018, 53(3): 1-12.
[6] 余传明,冯博琳,田鑫,安璐. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3): 13-23.
[7] 张军,李竞飞,张瑞,阮兴茂,张烁. 基于网络有效阻抗的社区发现算法[J]. 山东大学学报(理学版), 2018, 53(3): 24-29.
[8] 庞博,刘远超. 融合pointwise及深度学习方法的篇章排序[J]. 山东大学学报(理学版), 2018, 53(3): 30-35.
[9] 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45.
[10] 王彤,马延周,易绵竹. 基于DTW的俄语短指令语音识别[J]. 山东大学学报(理学版), 2017, 52(11): 29-36.
[11] 张晓东,董唯光,汤旻安,郭俊锋,梁金平. 压缩感知中基于广义Jaccard系数的gOMP重构算法[J]. 山东大学学报(理学版), 2017, 52(11): 23-28.
[12] 孙建东,顾秀森,李彦,徐蔚然. 基于COAE2016数据集的中文实体关系抽取算法研究[J]. 山东大学学报(理学版), 2017, 52(9): 7-12.
[13] 王凯,洪宇,邱盈盈,王剑,姚建民,周国栋. 一种查询意图边界检测方法研究[J]. 山东大学学报(理学版), 2017, 52(9): 13-18.
[14] 张帆,罗成,刘奕群,张敏,马少平. 异质搜索环境下的用户偏好性预测方法研究[J]. 山东大学学报(理学版), 2017, 52(9): 26-34.
[15] 杨艳,徐冰,杨沐昀,赵晶晶. 一种基于联合深度学习模型的情感分类方法[J]. 山东大学学报(理学版), 2017, 52(9): 19-25.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!