-
Chinese-Japanese multi-word phrase extraction and alignment based on multi-strategy filtering
- TANG Liang, LI Qian, XU Hong-bo, YI Mian-zhu
-
JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE). 2015, 50(09):
21-28.
doi:10.6040/j.issn.1671-9352.3.2014.016
-
Abstract
(
1322 )
PDF (1440KB)
(
622
)
Save
-
References |
Related Articles |
Metrics
On the task of cross-language text analysis, a multi-word phrase is less ambiguous and more accurate than a single word, which helps to understand the text more accurately. Existing methods mainly focus on cross-language alignment of single words. This paper presents an extraction and alignment method for Chinese-Japanese multi-word phrases based on multi-strategy filtering, which combines the multi-word phrases extraction and cross-language alignment. Firstly, we get multi-word phrases with complete semantics using repeated string, left-right adjacent entropy, internal relationship, multi-word nesting, stop-word method etc. Secondly, we use the parallel corpus to compute the similarity of Chinese-Japanese multi-word phrases, to achieve cross-language alignment. In the process, according to the rules and characteristics of the Japanese language, we dynamically adjust the threshold according to corpus' size and related domains, in order to improve the applicability of multi-word phrases. The experimental results show that this method is effective to extract Chinese-Japanese multi-word phrases as the alignment unit, which makes the semantic expression more complete and more practical value.