JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2015, Vol. 50 ›› Issue (03): 6-10.doi: 10.6040/j.issn.1671-9352.3.2014.284

Previous Articles     Next Articles

Weibo new word recognition combining frequency characteristic and accessor variety

ZHOU Chao, YAN Xin, YU Zheng-tao, HONG Xu-dong, XIAN Yan-tuan   

  1. School of Information Engineering and Automation of Computer Science, Kunming University of Science and Technology; Key Lab of Computer Technologies Application of Yunnan Province and Kunming, Kunming 650500, Yunnan, China
  • Received:2014-09-19 Revised:2015-01-16 Online:2015-03-20 Published:2015-03-13

Abstract: Along with the rapid development of Weibo, a lot of new words have appeared. These words have characteristic that spread fast and flexible combination with other words. They are easy to be cut apart into different string in segmentation processing. Therefore a new word recognition method that combines word frequency characteristics and accessor variety was proposed. The first step was to segment the large scale Weibo sentences into words, and then combine the two adjacent strings between stop words. The new word candidate strings could be obtained according to the string frequency of the combination. After the filtration through the word formation rules, the candidate new words would be found. Finally, through the characteristics of the word accessor variety, the garbage string was removed to get the new words. Experiments of new word recognition on COAE 2014 task 3 show that the accuracy can reach 36.5% and this method has a good performance.

Key words: Weibo new words, string frequency statistics, accessor variety, word formation rules

CLC Number: 

  • TP391
[1] LING G C, ASAHARA M, MATSUMOTO Y. Chinese unknown word identification using character-based tagging and chunking[C]//Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2003:197-200.
[2] 周正宇, 李宗葛.一种新的基于统计的词典扩展方法[J].中文信息学报, 2001, 15(5):46-51. ZHOU Zhengyu, LI Zongge. A new statistical method of automatic lexicon augmentation[J]. Journal of Chinese Information Processing, 2001, 15(5):46-51.
[3] WANG Aobo, KAN Min-Yen.Mining informal language from Chinese microtext: joint word recognition and segmentation[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2013:731-741.
[4] 郑家恒, 李文花. 基于构词法的网络新词自动识别初探[J]. 山西大学学报:自然科学版, 2002, 25(2):115-119. ZHENG Jiaheng, LI Wenhua. A study on automatic identification for internet new words according to word-building rule[J].Journal of Shanxi University: Natural Science Edition, 2002, 25(2):115-119.
[5] 崔世起,刘群,孟遥,等.基于大规模语料库的新词检测[J].计算机研究与发展,2006, 43(5):927-932. CUI Shiqi, LIU Qun, MENG Yao, et al. New word detection based on large-scale corpus[J]. Journal of Computer Research and Development, 2006, 43(5):927-932.
[6] 刘建舟, 何婷婷,骆昌日.基于语料库和网络的新词自动识别[J].计算机应用,2004, 24(7):132-134. LIU Jianzhou, HE Tingting, LUO Changri. Automatic new words detection based on corpus and web[J]. Journal of Computer Applications, 2004, 24(7):132-134.
[7] 邹纲,刘洋,刘群,等.面向Internet的中文新词语检测[J]. 中文信息学报,2004,18(6):1-9. ZOU Gang, LIU Yang, LIU Qun, et al. Internet-oriented Chinese new words detection[J]. Journal of Chinese Information Processing, 2004, 18(6):1-9.
[8] 何赛克,王小捷,董远,等.归一化的邻接变化数方法在中文分词中的应用[J].中文信息学报,2010,24(1):15-19. HE Saike, WANG Xiaojie, DONG Yuan, et al.Apply normalized accessor variety in Chinese word segmentation[J]. Journal of Chinese Information Processing, 2010, 24(1):15-19.
[9] FENG Haodi, CHEN Kang, KIT Chunyu, et al. Unsupervised segmentation of Chinese corpus using accessor variety[C]//Proceeding of the 1st International Joint Conference on Natural Language Processing-IJCNLP 2004. Berlin: Springer, 2005:694-703.
[1] ZHANG Xiaoyuan, TIAN Yi, REN Zihan, DUAN Tianyu, YANG Siyuan, ZHANG Yuexuan. Application of topology neighborhood bases in density clustering algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(5): 55-64.
[2] . Based on multi-scale feature fusion and improved attention for rusty bolt and nut detection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 1-14.
[3] ZHONG Shang, MA Li, LIU Wenzhe, LI Yuhao. Lightweight water surface small object detection model with multi-scale attention mechanism and improved feature fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 15-25.
[4] YU Lei, SUN Yi, HUA Jinming, LI Laquan. Analysis of the prediction model based on deep neural networks for mortality risk prediction for sepsis patients in intensive care units [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 26-35.
[5] . Fuzzy mathematical morphology edge detection method derived from general overlap functions [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 36-48.
[6] . Fuzzy rough c-means based on the knowledge measure [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 49-64.
[7] SUN Qing, YE Jun, ZENG Guangcai, SONG Suyang, WANG Yixin. Three-way K-means algorithm combining the bat algorithm and the improved compactness [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 65-75.
[8] ZOU Zheng, LEI Yusheng, LIU Shijian, WANG Dingyi, QIU Xuewei, SHI Wenwen, ZHOU Xiaotong. Precise morphological recognition with zonal micro-direction for termites [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(1): 76-84.
[9] Xia LIANG,Jie GUO. A method of online teaching platform selection based on online reviews [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(9): 108-118.
[10] Chao LI,Wei LIAO. Chinese disease text classification model driven by medical knowledge [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 122-130.
[11] Jie JI,Chengjie SUN,Lili SHAN,Boyue SHANG,Lei LIN. A prompt learning approach for telecom network fraud case classification [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 113-121.
[12] Qi LUO,Gang GOU. Multimodal conversation emotion recognition based on clustering and group normalization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 105-112.
[13] Fengxu ZHAO,Jian WANG,Yuan LIN,Hongfei LIN. Probability distribution optimization model for learning to rank [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 95-104.
[14] Xingyu HUANG,Mingyu ZHAO,Ziyu LYU. Category-wise knowledge probers for representation learning of graph neural networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 85-94.
[15] Liang GUI,Yao XU,Shizhu HE,Yuanzhe ZHANG,Kang LIU,Jun ZHAO. Factual error detection in knowledge graphs based on dynamic neighbor selection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 76-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!