JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2015, Vol. 50 ›› Issue (03): 20-27.doi: 10.6040/j.issn.1671-9352.3.2014.101

Previous Articles     Next Articles

Entity set expansion based on LDA and label propagation

MA Yu-feng, RUAN Tong   

  1. Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Received:2014-08-28 Revised:2014-11-25 Online:2015-03-20 Published:2015-03-13

Abstract: Set expansion refers to expanding a partial set of "seed" objects into a more complete set. A widely employed approach to set expansion is based on iterative bootstrapping, which can be applied with only small amounts of supervision and which scales bad to very large corpus. A well-known problem with iterative bootstrapping is a phenomenon known as semantic drift: as bootstrapping proceeds it is likely that unreliable patterns will lead to false extractions. To address this issue, a hybrid method for entity set expansion was proposed based on LDA and label propagation. The whole entities in an entity list were considered to prevent words ambiguity; and the LDA used model to mine semantic information in contexts between entity lists to resolve the semantic drift phenomenon. Experiments were conducted with some datasets, and the evaluation demonstrates the effectiveness, efficiency, and scalability of the proposed solution.

Key words: topic model, seed, LDA, label propagation, entity set expansion

CLC Number: 

  • TP391
[1] WANG R C, COHEN W W. Language-independent set expansion of named entities using the web[C]// Proceedings of the 7th IEEE International Conference on Data Mining (ICDM'07). Piscataway:IEEE, 2007:342-350.
[2] WANG R C, COHEN W W. Iterative set expansion of named entities using the web[C]// Proceedings of the 8th IEEE International Conference on Data Mining (ICDM'08). Piscataway:IEEE, 2008:1091-1096.
[3] WANG R C, COHEN W W. Character-level analysis of semi-structured documents for set expansion[C]// Proceedings of 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2009:1503-1512.
[4] HE Yeye, DONG Xin. Seisa:set expansion by iterative similarity aggregation[C]// Proceedings of the 20th International Conference on World Wide Web. New York:ACM, 2011:427-436.
[5] LI Xiaoli, ZHANG Lei, LIU Bing, et al. Distributional similarity vs. PU learning for entity set expansion[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2010:359-364.
[6] QI Zhenyu, LIU Kang, ZHAO Jun. A novel entity set expansion method leveraging entity semantic knowledge[J]. Journal of Chinese Information Processing, 2013, 27(2):1-9.
[7] SADAMITSU K, SAITO K, IMAMURA K, et al. Entity set expansion using topic information[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2011:726-731.
[8] SADAMITSU K, SAITO K, IMAMURA K, et al. Entity set expansion using interactive topic information[C]// Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation. Somerset:ACL,2012:108-116.
[9] JINDAL P, ROTH D. Learning from negative examples in set-expansion[C]// Proceedings of IEEE 11th International Conference on Data Mining. Washington:IEEE Computer Society, 2011:1110-1115.
[10] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. The Journal of Machine Learning Research, 2003, 3:993-1022.
[11] ZHU Xiaojin, GHAHRAMANI Zoubin. Learning from labeled and unlabeled data with label propagation[R]. Pittsburgh:Carnegie Mellon University, 2002.
[12] ZHANG Huaping, LIU Qun, CHENG Xueqi, et al. Chinese lexical analysis using hierarchical hidden Markov model[C]// Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. Stroudsburg:Association for Computational Linguistics, 2003:63-70.
[13] WENG Jianshu, LIM E P, JIANG Jing, et al. Twitter rank:finding topic sensitive influential twetterers[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York:ACM, 2010:261-270.
[1] PANG Jin-ding, LI Jia-qi, FENG Yan, CHEN Yun-fa, YANG Jun. Supported ruthenium-based nanostructures toward catalytic oxidation of volatile organic compounds [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(5): 18-24.
[2] ZHANG Yao-jun, WAN Gang-qiang, YAN Lei, MA Qing-chang, LI Dong-xiang, ZHAO Ji-kuan. Assembled nanostructures of ZnO nanorods prepared by seed growth method [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(1): 14-19.
[3] WANG Li-ren, YU Zheng-tao, WANG Yan-bing, GAO Sheng-xiang, LI Xian-hui. Micro-blogging topic mining based on supervised LDA user interest model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(09): 36-41.
[4] ZHENG Yan, PANG Lin, BI Hui, LIU Wei, CHENG Gong. Feature selection algorithm based on sentiment topic model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 74-81.
[5] FAN Ting-jun, DIAO Jin-mei. Research advances of tissue-engineered human corneal endothelium [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(1): 1-7.
[6] WANG Shao-peng, PENG Yan, WANG Jie. Research of the text clustering based on LDA using in network public opinion analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 129-134.
[7] JIAO Lu-lin, PENG Yan, LIN Yun. Comparative research on text knowledge discovery for network public opinion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 62-68.
[8] SHI Cun-hui, LIN Hong-fei*. Tracking event microblogs: a streaming dynamic topic model [J]. J4, 2012, 47(5): 13-18.
[9] WANG Ling-xiu, CAO Ye-wen*. A load distribution algorithm based on an ant colony for multi-source multicast networks [J]. J4, 2011, 46(11): 28-32.
[10] FANG Yong,LIANG Yu,WANG Yue-hai,WANG Wei-dong . The vegetation characteristics of limestone mountains in Jinan and their application to the optimal disposition of vegetation [J]. J4, 2008, 43(1): 8-13 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!