JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2014, Vol. 49 ›› Issue (12): 23-29.doi: 10.6040/j.issn.1671-9352.3.2014.123

Previous Articles     Next Articles

Feature extraction method based on Bootstrapping in English product comment

WANG Hui, CHEN Guang   

  1. Pattern Recognition and Intelligent System Laboratory, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2014-08-28 Revised:2014-10-21 Online:2014-12-20 Published:2014-12-20

Abstract: An feature extraction method based on Bootstrapping in English product comment was proposed. By this method, starting with a set of extraction patterns as seeds, and then applying an incremental iterative procedure to find new features. During the process of the each iteration, the system ranks the new features by score, which is calculated by the intimacy relationship between the candidate features and patterns. This is useful for prevent topic drift. After extracting features, WordNet is used to calculate the similarity between features. Then clustering the features by the similarity score, get different aspects of the product features, then filtering out the low score of the class clusters, remove noise. What's more, to improve the portability of the system, the seed features are replaced by seed patterns. Experimental results show that extracting features by this method has a good result, the precision, recall and F-measure reach 0.799, 0.779, 0.789 and it has good extraction performance.

Key words: wordnet, bootstrapping, information extraction, feature extraction

CLC Number: 

  • TP391
[1] THET T T, NA J C, KHOO C S. Aspect-based sentiment analysis of movie reviews on discussion boards [J]. Journal of Information Science, 2010, 36(6): 823-848.
[2] HU Minjing, LIU Bing. Mining and summarizing customer reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'04). New York: ACM, 2004: 168-177.
[3] RAJU S, PINGALI P, VARMA V. An unsupervised approach to product attribute extraction[C]//Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval. New York: ACM, 2009:796-800.
[4] ARUN A, SRINIVASAN P. Automated query generation of Rdbms for informationand knowledge extraction[C]//Proceedings of 2013 International Conference on Information Communication and Embedded Systems. Chennai:IEEE Press,2013: 468-473.
[5] MANNAI M. Ben Abdessalem Karaa W. Bayesian information extraction network for medline abstract[C]//Proceedings of 2013 International Conference on Computer and Information Technology (WCCIT).Sousse:IEEE Press,2013: 1-3.
[6] PROBST K, GHAI M K R, FANO A, et al. Semi-supervised learning of attribute-value pairs from product description[C]//Proceedings of the 20th International Joint Conference on Artificial Intelligence. Freiburg: IJCAI-INT, 2007:2838-2843.
[7] GAMON M, AUE A, OLIVER S, et al. Mining customer opinions fromm text[C]//Proceedings of the 6th International Symposium on Intelligent Data Analysis.[s.1.]:Springer-Verlag, 2005: 897-968.
[8] LIMA R, OLIVEIRA H, et al.Information extraction from the web: an ontology-based method using inductive logic programming [J]. Tools with Artificial Intelligence, 2013, 30: 741-748.
[9] QIU Guang, LIU Bing, BU Jiajun, et al. Opinion word expansion and target extraction through double propagation [J]. Computational Linguistics, 2011, 37(1): 9-27.
[10] 宋乐, 何青青, 王倩,等. 极性相似度计算在词汇倾向性识别中的应用[J].中文信息学报, 2010, 24:63-67. SONG Le, HE Qingqing, WANG Qian,et al. Polarity similarity calculation in terms propensity recognition[J]. Journal of Chinese Information Processing, 2010, 24:63-67.
[11] MIAO Gengxin, TATEMURA Junichi, HSIUNG Wangpin, et al. Extracting data records from the web using tag path clustering [C]//Proceedings of International World Wide Web Conference Committee(IW3C2). New York: ACM, 2009: 981-990.
[12] Manuel Alvarez, Alberto Pan, Juan Raposo, et al. Using clustering and edit distance techniques for automatic web data extraction [J]. Web Information Systems Engineering, 2007, 4831:212-224.
[13] LAKKARAJU H, BHATTACHARYYA C, BHATTACHARYA I. Exploiting coherence for the simultaneous discovery of latent facets and associated sentiments[C]//Proceedings of 2011 SIAM International Conference on Data Mining. Mesa, Arizona, 2011: 498-509.
[14] HIROKAWA S. Feature extraction using restricted bootstrapping[C]//Proceedings of 2012 IEEE/ACIS 11th International Conference on Computer and Information Science. Los Alamitos: IEEE Computer Society, 2012:283-288.
[15] 栗春亮, 朱艳辉, 徐叶强. 中文产品评论中属性词抽取方法研究[J]. 计算机工程, 2011, 37: 26-29. LI Chunliang, ZHU Yanhui, XU Yeqiang. Research of feature extraction method in Chinese product reviews[J]. Computer Engineering, 2011, 37:26-29.
[16] POPESCU A M, ETZIONI O. Extracting product features and opinions from reviews[J]. Natural Language Processing and Text Mining, 2007:2358-2362.
[17] JO Y, OH A. Aspect and sentiment unification model for online review analysis[C]//Proceedings of the fourth ACM International Conference on Web Search and Data Mining. New York: ACM, 2010:815-824.
[18] CHANG Chia-hui, HSU Chun-nan, LUI Shao-cheng. Automatic information extraction from semi-structured web pages by pattern discovery [J]. Decision Support Systems, 2003(35):129-147.
[1] CHEN Xin, XUE Yun, LU Xin, LI Wan-li, ZHAO Hong-ya, HU Xiao-hui. Text feature extraction method for sentiment analysis based on order-preserving submatrix and frequent sequential pattern mining [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 36-45.
[2] SUN Jian-dong, GU Xiu-sen, LI Yan, XU Wei-ran. Chinese entity relation extraction algorithms based on COAE2016 datasets [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 7-12.
[3] SHI Han-xiao, LI Xiao-jun, HAO Teng-da, LIU Hong, ZHU Liu-qing. Emotion analysis on Microblog short text [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(7): 80-90.
[4] SU Feng-long, XIE Qing-hua, HUANG Qing-quan, QIU Ji-yuan, YUE Zhen-jun. Semi-supervised method for attribute extraction based on transductive learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(3): 111-115.
[5] LI Zhi-heng, YANG Zhi-hao, LIN Hong-fei. Semantic output output-based disease-protein knowledge extraction [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(3): 104-110.
[6] XU Ye, XU Wei-ran. Algorithm of knowledge base cumulative citation recommendation based on semantic features expansion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 26-32.
[7] ZHU Li-ping, LI Hong-qi, YANG Zhong-guo, LIU Qiang. An information extraction method for scientific literature introduction [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 23-30.
[8] GUAN Mian, MA Jun. Automatic structured data extraction from Web forums [J]. J4, 2010, 45(5): 42-47.
[9] WANG Jing,YAO Yong,LIU Zhi-jing . Web information extraction based on a generalized hidden Markov model [J]. J4, 2007, 42(11): 49-52 .
[10] WANG Lei,CHEN Zhi-ping,LI Zhi-cheng . Using text blocks based on multiple templates hidden markov model for text information extraction [J]. J4, 2006, 41(3): 19-24 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!