山东大学学报(理学版) ›› 2018, Vol. 53 ›› Issue (3): 13-23.doi: 10.6040/j.issn.1671-9352.0.2017.064
余传明1,冯博琳1,田鑫1,安璐2*
YU Chuan-ming1, FENG Bo-lin1, TIAN Xin1, AN Lu2*
摘要: 迁移学习专注于解决监督学习在小数据集上难以获得好的分类效果的问题,与传统监督学习的基本假设相比,它并不要求训练集和测试集服从相同或相似的数据分布。通过在标注资源丰富的源语言中学习,并将目标语言的文档投影到与源语言相同的特征空间中去,从而解决目标语言因数据量较小而难以获得好的分类模型的问题。选择亚马逊在书籍、DVD和音乐类目下的中文、英文和日文评论作为实验数据,情感分析作为研究任务,提出了一种新的跨语言深度表示学习模型(cross lingual deep representation learning, CLDRL),实现了不同语言环境下的知识迁移。实验结果表明,CLDRL模型在跨语言环境下最优F1值达到了78.59%,证明了该模型的有效性。
中图分类号:
[1] 余传明. 从产品评论中挖掘观点:原理与算法分析[J]. 情报理论与实践, 2009, 32(7):124-128. YU Chuanming. Mining opinions from product review: principles and algorithm analysis[J]. Information Studies: Theory & Application, 2009, 32(7):124-128. [2] 余传明, 安璐. 从小数据到大数据——观点检索面临的三个挑战[J]. 情报理论与实践, 2016, 39(2):13-19. YU Chuanming, AN Lu. From small data to big data——the three challenges of opinion retrieval[J]. Information Studies: Theory & Application, 2016, 39(2):13-19. [3] BLITZER J, DREDZE M, PEREIRA F. Domain adaptation for sentiment classification[EB/OL].[2017-02-16].http://www.cs.jhu.edu/~mdredze/publications/sentiment_acl07.pdf [4] PAN Jialin, YANG Qiang. A survey on transfer learning[J]. IEEE Transactions on Knowledge & Data Engineering, 2010, 22(10):1345-1359. [5] PAN Weike, ZHONG Erheng, YANG qiang. Transfer learning for text mining[M] // Aggarwal C, Zhai C. Mining Text Data. Boston: Springer, 2012: 223-257. [6] RIGUTINI L, MAGGINI M, LIU Bing. An EM based training algorithm for cross-language text categorization[C] // Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. Washington: IEEE Computer Society, 2005: 529-535. [7] WAN Xiaojun. Co-training for cross-lingual sentiment classification[C] // Processing of the AFNLP. Stroudsburg: Association for Computational Linguistics, 2009: 235-243. [8] DUH K, FUJINO A, NAGATA M. Is machine translation ripe for cross-lingual sentiment classification?[C] // Proceedings of Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2011: 429-433. [9] PRETTENHOFER P, STEIN B. Cross-lingual adaptation using structural correspondence learning[J]. ACM Transactions on Intelligent Systems & Technology, 2011, 3(1):1-13. [10] FERNÁNDEZ A M, ESULI A, SEBASTIANI F. Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification[J]. Journal of Articial Intelligence Research, 2015(55):131-163. [11] DUMAIS S T, LETSCHE T A, LITTMAN M L, et al. Automatic cross-language retrieval using latent semantic indexing[J]. AAAI Symposium on Crosslanguage Text & Speech Retrieval, 1997, 1:51-62. [12] DEERWESTER S. Indexing by latent semantic analysis[J]. Journal of the Association for Information Science and Technology, 1990, 41(6):391-407. [13] XIAO Min, GUO Yuhong. A novel two-step method for cross language representation learning[J]. Advances in Neural Information Processing Systems, 2013: 1259-1267. [14] VULIC I, SMET W D, TANG Jie, et al. Probabilistic topic modeling in multilingual settings: an overview of its methodology and applications[J]. Information Processing & Management, 2015, 51(1):111-147. [15] GLOROT X, BORDES A, BENGIO Y. Domain adaptation for large-scale sentiment classification: a deep learning approach[C] // Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue: Omnipress, 2011: 513-520. [16] ZHOU Guangyou, ZHU Zhiyuan, HE Tingting, et al. Cross-lingual sentiment classification with stacked autoencoders[J]. Knowledge and Information Systems, 2016, 47(1):27-44. [17] ZHOU Guangyou, ZENG Zhao, HUANG Xiangji, et al. Transfer learning for cross-lingual sentiment classification with weakly shared deep neural networks[C] // Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2016: 245-254. [18] KIM Y. Convolutional neural networks for sentence classification[EB/OL].[2017-01-02].arXiv:1408.5882. [19] KALCHBRENNER N, GREFENSTETTE E, BLUNSOMP. A convolutional neural network for modelling sentences[EB/OL].[2017-01-05]. arXiv: 1404.2188. [20] COLLOBERT R, WESTON J. A unified architecture for natural language processing: deep neural networks with multitask learning[C] // Proceedings of the 25th International Conference on Machine Learning. New York: ACM, 2008: 160-167. [21] GAO Jianfeng, PANTEL P, GAMON M, et al. Modeling interestingness with deep neural networks[C] // Proceedings of Annual Conference on Empirical Methods in Natural Language Processing.[S.l.] :[s.n.] ,2014:2-13. [22] YAN Chao, ZHANG Bailing, COENEN F. Driving posture recognition by convolutional neural networks[C] // Proceedings of the International Conference on Natural Computation. New York: IEEE, 2015: 680-685. [23] NGIAM J, KOH P, CHEN Zhenghao, et al. Sparse filtering[C] // Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada: Curran Associates Inc, 2011: 1125-1133. [24] DAHL G E, RANZATO M, MOHAMED A R, et al. Phone recognition with the mean-covariance restricted boltzmann machine[C] // Proceedings of the 23rd International Conference on Neural Information Processing Systems. Granada: Curran Associates Inc, 2010: 469-477. [25] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90. [26] BOSER B E, GUYON I M, VAPNIK V N. A training algorithm for optimal margin classifiers[C] // Proceedings of Annual ACM Workshop on Computational Learning Theory. New York: ACM, 1996, 5:144-152. [27] TANG Yichuan. Deep learning using linear support vector machines[EB/OL].[2017-02-05]. arXiv:1306.0239v4. [28] DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research, 2011, 12(7):2121-2159. [29] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[J]. Journal of Machine Learning Research, 2010, 9:249-256. [30] ZHOW Huiwei, CHEN Long, SHI Fulin, et al. Learning bilingual sentiment word embeddings for cross-language sentiment classification[C] // Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing.[S.l.] :[s.n.] , 2015:430-441. [31] ZHOU Xinjie, WAN Xiaojun, XIAO Jianguo. Cross-lingual sentiment classification with bilingual document representation learning[C] // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.[S.l.] :[s.n.] , 2016:1403-1412. [32] GIATSOGLOU M, VOZALIS M G, DIAMANTARAS K, et al. Sentiment analysis leveraging emotions and word embeddings[J]. Expert Systems with Applications, 2017, 69:214-224 [33] RUDER S, GHAFFARI P, BRESLIN J G. NSIGHT-1 at SemEval-2016 Task 5: deep learning for multilingual aspect-based sentiment analysis[C] // International Workshop on Semantic Evaluation.[S.l.] :[s.n.] , 2016: 330-336. [34] CHEN Xilun, SUN Yu, ATHIWARATKUN B. Adversarial deep averaging networks for cross-lingual sentiment classification[EB/OL].[2016-12-08].arXiv: 1606.01614. |
[1] | 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45. |
[2] | 孙世昶,林鸿飞,孟佳娜,刘洪波. 面向序列迁移学习的似然比模型选择方法[J]. 山东大学学报(理学版), 2017, 52(6): 24-31. |
[3] | 王彤,马延周,易绵竹. 基于DTW的俄语短指令语音识别[J]. 山东大学学报(理学版), 2017, 52(11): 29-36. |
[4] | 何炎祥, 刘健博, 孙松涛, 文卫东. 基于层叠条件随机场的微博商品评论情感分类[J]. 山东大学学报(理学版), 2015, 50(11): 67-73. |
[5] | 朱珠, 李寿山, 戴敏, 周国栋. 结合主动学习和自动标注的评价对象抽取方法[J]. 山东大学学报(理学版), 2015, 50(07): 38-44. |
[6] | 周文, 张书卿, 欧阳纯萍, 刘志明, 阳小华. 基于情感依存元组的新闻文本主题情感分析[J]. 山东大学学报(理学版), 2014, 49(12): 1-6. |
[7] | 杨佳能, 阳爱民, 周咏梅. 基于语义分析的中文微博情感分类方法[J]. 山东大学学报(理学版), 2014, 49(11): 14-21. |
[8] | 朱玺, 董喜双, 关毅, 刘志广. 基于半监督学习的微博情感倾向性分析[J]. 山东大学学报(理学版), 2014, 49(11): 37-42. |
[9] | 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50. |
[10] | 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30. |
[11] | 张成功1,2,刘培玉1,2*,朱振方1,2,方明1,2. 一种基于极性词典的情感分析方法[J]. J4, 2012, 47(3): 47-50. |
[12] | 黄贤立,罗冬梅. 倾向性文本迁移学习中的特征重要性研究[J]. J4, 2010, 45(7): 13-17. |
|