JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2016, Vol. 51 ›› Issue (1): 77-83.doi: 10.6040/j.issn.1671-9352.3.2014.289

Previous Articles     Next Articles

A bilingual word alignment method of Vietnamese-Chinese based on deep neural network

MO Yuan-yuan1, GUO Jian-yi1,2*, YU Zheng-tao1,2, MAO Cun-li1,2, NIU Yi-tong1   

  1. 1. School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650051, Yunnan, China;
    2. Intelligent Information Processing Key Laboratory, Kunming University of Science and Technology, Kunming 650051, Yunnan, China
  • Received:2015-03-03 Online:2016-01-16 Published:2016-11-29

Abstract: It is difficult to achieve auto-alignment between Vietnamese and Chinese, because their syntax and structure are quite different. In this case, we present a novel method for the Vietnamese-Chinese word alignment based on DNN(deep neural network). Firstly, we should convert Vietnamese-Chinese bilingual word into word embedding, and as the input within DNN. Secondly, DNN-HMM word alignment model is constructed by expanding HMM model, which also integrating the context information. The basic model of the experiments are HMM and IBM4. The results of large-scale Vietnamese-Chinese bilingual word alignment task show that this method not only significantly improved its accuracy and recall rate than the two basic models, but also greatly reduced word alignment error rate.

Key words: word alignment, DNN, Vietnamese, Chinese

CLC Number: 

  • TP391
[1] LE H P, HO T V. A maximum entropy approach to sentence boundary detection of Vietnamese texts[C] //IEEE International Conference on Research, Innovation and Vision for the Future-RIVF 2008. New York: IEEE, 2008:1-6.
[2] HUYÊN N T M, ROUSSANALY A, VINH H T. A hybrid approach to word segmentation of Vietnamese texts[J]. Language and Automata Theory and Applications, 2008:240-249.
[3] 越南语词法分析系统 [EB/OL]. [2014-11-12].http://www.loria.fr/~lehong/tools/vn-Tokenizer.php.
[4] BROWN P F, PIETRA V J D, PIETRA S A D, et al. The mathematics of statistical machine translation:parameter estimation[J]. Computational Linguistics, 1993, 19(2):263-311.
[5] Franz Josef Och, Hermann Ney. A systematic comparison of various statistical alignment models[J].Computational Linguistics, 2003, 29(1):19-51.
[6] BLUNSOM P, COHN T. Discriminative word alignment with conditional random fields[C] //Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Philadelphia:Association for Computational Linguistics, 2006:65-72.
[7] LIU Y, LIU Q, LIN S. Discriminative word alignment by linear modeling[J]. Computational Linguistics, 2010, 36(3):303-339.
[8] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[9] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing(almost)from scratch[J]. The Journal of Machine Learning Research, 2011, 12:2493-2537.
[10] NIEHUES J, WAIBEL A. Continuous space language models using restricted boltzmann machines[C] //Proceedings of the 9th International Workshop on Spoken Language Translation(IWSLT).[S.l.] :[s.n.] , 2012:1-48.
[11] GOLDBERG Y, LEVY O. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method[EB/OL].[2014-10-24].http://arxiv.org/pdf/1402.3722v1.pdf.
[12] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].[2014-10-24]. http://arxiv.org/pdf/1301.3781.pdf.
[13] KLEMENTIEV A, TITOV I, BHATTARAI B. Nducing crosslingual distributed representations of words [C] //Proceedings of the International Conference on Computational Linguistics. Bombay: 2012 Organizing Committee, 2012.
[14] ZHENG Xiaoqing, CHEN Haiyang, XU Tianyu. Deep learning for Chinese word segmentation and POS tagging[C] //Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2013:647-657.
[15] DAHL G E, YU D, DENG L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]., IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1):30-42.
[16] YANG Nan, LIU Shujie, LI Mu, et al. Word alignment modeling with context dependent deep neural network[C] //Proceedings ofthe 51st Annual Meeting of the Association for Computational Linguistics. Somerset: ACL, 2013: 166-175.
[17] BENGIO Y. Learning deep architectures for AI[M]. Now Publishers Inc Hanover, 2009.
[1] LIN Li. News event extraction based on kernel dependency graph [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 121-126.
[2] HU Mo-zhi, YAO Tian-fang. Recognition of Chinese Micro-blog sentiment polarity and extraction of opinion target [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(7): 81-89.
[3] MA Li-fei, MO Qian, DU Hui. Research on classification for Chinese short film reviews [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(1): 52-57.
[4] CHEN Xing-jun, WEI Jing-jing, LIAO Xiang-wen, JIAN Si-yuan, CHEN Guo-long. Extraction of opinion targets and opinion words from Chinese sentences based on word alignment model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(1): 58-64.
[5] TANG Liang, LI Qian, XU Hong-bo, YI Mian-zhu. Chinese-Japanese multi-word phrase extraction and alignment based on multi-strategy filtering [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(09): 21-28.
[6] ZAN Hong-ying, WU Yong-gang, JIA Yu-xiang, NIU Gui-ling. Chinese Micro-blog named entity linking based on multisource knowledge [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 9-16.
[7] TAN Hong-ye, ZHAO Jian, CHEN Qian. Finding potential errors in Chinese treebank based on inconsistencies [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(01): 26-30.
[8] YANG Jia-neng, YANG Ai-min, ZHOU Yong-mei. Sentiment classification method of Chinese Micro-blog based on semantic analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 14-21.
[9] DU Xi-hua, SHI Xiao-qin, FENG Chang-jun, LI Liang. rediction of chromatograph retention index by artificial neural  network by #br# study on volatile constituents of wild chinese chives [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(1): 50-53.
[10] PAN Qing-qing, ZHOU Feng, YU Zheng-tao, GUO Jian-yi, XIAN Yan-tuan. Recognition method of Vietnamese named entity based on#br# conditional random fields [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(1): 76-79.
[11] TIAN Hai-long, ZHU Yan-hui, LIANG Tao, MA Jin, LIU Jing. Research on identificating Chinese micro-blog opinion sentence based on three-way decisions [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(08): 58-65.
[12] MA Yan, LIU Jian-wei, ZHANG Yu-fei. Threshold signature-based lightweight clustering handover scheme for Ad hoc networks [J]. J4, 2012, 47(11): 78-82.
[13] ZHAO Zhong-Juan, WANG Yan-Fei, CAI Yun-Fei, JIA Guang-Min, XIANG Feng-Ning. Determination of efficacious components of Gentianaceae plants and their calli by RP-HPLC [J]. J4, 2010, 45(1): 41-45.
[14] WANG Kang, LI Hua. Analysis of the compound Haqing injection with hyphenated chromatography and chemometric resolution [J]. J4, 2009, 44(11): 16-20.
[15]

WANG Li-juan..LI Qiu-ling,WANG Hong-mei,LI Jian-bin,WANG Zhi-yu,WANG Chang-fa,HOU Ming-hai,ZHONG Ji-feng*

. Genetic polymorphisms of the prolactin gene A8398G and its correlation with milk production traits in Chinese Holstein cows [J]. J4, 2008, 43(5): 10-13 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!