您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2016, Vol. 51 ›› Issue (1): 95-100.doi: 10.6040/j.issn.1671-9352.1.2015.C03

• • 上一篇    下一篇

基于SVD的档案学主题挖掘

奉国和,王丹迪,李媚婵   

  1. 华南师范大学经济与管理学院, 广东 广州 510006
  • 收稿日期:2015-05-27 出版日期:2016-01-16 发布日期:2016-11-29
  • 作者简介:奉国和(1971— ),男,博士,教授,硕士生导师,主要研究方向为档案信息智能化处理、数据挖掘.E-mail:ghfeng@163.com

Text topic mining of archives research based on SVD

FENG Guo-he, WANG Dan-di, LI Mei-chan   

  1. College of Economic &
    Management, South China Normal University, Guangzhou 510006, Guangdong, China
  • Received:2015-05-27 Online:2016-01-16 Published:2016-11-29

摘要: 收集2010—2014年国家社科基金档案学领域立项课题,基于课题名称进行分词等预处理,得到词项-文档矩阵,依据词项重要性设计局部及全局权重,组合局部与全局权重,得到词项-文档矩阵权重值。利用奇异值分解SVD进行特征降维,研究在不同维度下近5 a国家社科档案学立项课题研究主题。经过可视化分析得到社科档案学七大研究主题为:非物质文化遗产保护、电子文件管理、数字资源建设及体系、档案信息资源价值与挖掘、档案保护机制、档案馆研究、档案信息安全。

关键词: 权重设计, 主题挖掘, 奇异值分解, 档案学课题, 词项-文档矩阵

Abstract: The data of National Social Science Fund Project on Archives Field from 2010 to 2014 were collected, the words of the project title are separated, and the term-document matrix was obtained. According to the importance level of the terms, local and whole weight was designed, local weight was integrated with whole weight, which obtained the weight value of the term-document matrix. Feature dimension reduction was implemented by SVD, the recent National Social Science Archives Project themes in different dimensions were studied. Eventually, seven research topics of social science archives were obtained by visually analyzing, which were the intangible cultural heritage protection, electronic document management, digital resource construction, value and research of the archival information resource, archival information protecting system, research of the archives, security of the archival information.

Key words: term-document matrix, singular value decomposition, topic mining, weight design, archives project

中图分类号: 

  • TP391.1
[1] 毕建新,郑建明.近十年档案学国家级基金项目计量研究[J].档案学通讯,2013(5):31-34.
[2] SHAIK Z, GARLA S, CHAKRABORTY G. An application of text mining to reveal trends[EB/OL].(2012-04-02)[2015-05-06]. http://support.sas.com/resources/papers/proceedings12/135-2012.pdf.
[3] ALBRIGHT R.Taming text with the SVD[EB/OL].[2015-11-29].http://ftp.sas.com/techsup/download/EMiner/TamingTextwiththeSVD.pdf.
[4] 全国哲学社会科学规划办公室.国家社科基金项目数据库[DB/OL].[2015-05-06].http://www.npopss-cn.gov.cn/.
[5] SAS.Getting Started with SAS text miner13.2[EB/OL].[2015-11-29]. http://support.sas.com/documentation/onlinedoc/txtminer/index.html#txtminer13x.
[6] 廖安平,刘建州.矩阵论[M].长沙:湖南大学出版社,2005:57-58.
[7] CHAKRABORTY G, PAGOLU M, GARLA S.Text mining and analysis: practical methods, examples, and case studies using SAS[M]. North Carolina Carey, America:SAS Institute Inc, 2013:72-83.
[1] 余传明,左宇恒,郭亚静,安璐. 基于复合主题演化模型的作者研究兴趣动态发现[J]. 山东大学学报(理学版), 2018, 53(9): 23-34.
[2] 郑禅, 李寒宇. 半定内积下的矩阵奇异值分解[J]. 山东大学学报(理学版), 2014, 49(12): 81-86.
[3] 梁茂林,代丽芳,杨晓亚. 线性流形上行反对称矩阵反问题的最小二乘解及最佳逼近[J]. J4, 2012, 47(4): 121-126.
[4] 朱光艳1, 刘晓冀2 *. 矩阵的加M权右对称因子[J]. J4, 2011, 46(2): 114-116.
[5] 霍玉洪. 求解矩阵方程A1XB1+A2XTB2=E的一般解[J]. J4, 2009, 44(12): 44-47.
[6] 贾志刚,赵建立,张凤霞 . 广义对称矩阵的特征问题及其奇异值分解[J]. J4, 2007, 42(12): 15-18 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!