J4 ›› 2012, Vol. 47 ›› Issue (5): 43-48.

• Articles • Previous Articles     Next Articles

A Chinese organization′s full name and matching abbreviation  algorithm based on edit-distance

HUANG Lin-sheng1, DENG Zhi-hong1,2, TANG Shi-wei1,2, WANG Wen-qing3, CHEN Ling3   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;
    2. Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing 100871, China;
    3. Administrative Center of China Academic Library and Information System, Beijing 100871, China
  • Received:2011-11-10 Online:2012-05-20 Published:2012-06-01

Abstract:

When dealing with the specific problem of a  Chinese organization′s full name and matching abbreviation,  the traditional string matching algorithm based on editdistance performs poorly. A new algorithm,  also based on editdistance, was provided. The improvements include the following steps: (1)  making the Chinese word segmentation  fit  the Chinese grammatical structure features, (2) modifying the editoperation weights with the redefined semantic similarity, (3) adjusting these weights by adaptive learning, and (4) choosing the full name with minimum edit-distance as the matching result. Experimental results show that our algorithm can effectively achieve higher abbreviationfull name matching accuracy.

Key words: text mining; machine learning; edit distance; organization name; abbreviation-full name match

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!