JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2015, Vol. 50 ›› Issue (07): 76-79.doi: 10.6040/j.issn.1671-9352.3.2014.217

Previous Articles     Next Articles

Improvement of Lucene full-text indexing efficiency

LI Sheng-dong1, LÜ Xue-qiang2, SUN Jun3, SHI Shui-cai2,4   

  1. 1. Department of Computer Engineering, Langfang Yanjing Polytechnic College, Langfang 065200, Hebei, China;
    2. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China;
    3. North China Institute of Aerospace Engineering, Langfang 065000, Hebei, China;
    4. Beijing TRS Information Technology Co., Ltd., Beijing 100101, China
  • Received:2014-10-20 Online:2015-07-20 Published:2015-07-31

Abstract: Lucene is an excellent open-source full-text search technology framework that can be well embedded in its own search engine by expanding its functions in accordance with the framework specification. Lucene index structure and principles were studied, and the efficiency of indexing was enhanced by improving incremental indexing, increasing the size of index buffer in memory and decreasing the frequency of writing index to disk. A full-text retrieval experiments were designed. As a result, the average efficiency of creating index for 10 000 documents has been improved by 19.5%, and the method has good prospects.

Key words: full-text index, full-text retrieval, information retrieval, efficiency

CLC Number: 

  • TP393
[1] 王婧,王新房.基于内容的中文文本检索方法[J].计算机系统应用,2012,21(9): 214-216. WANG Jing, WANG Xinfang. Chinese text retrieval method based on content [J]. Computer Systems & Applications, 2012, 21(9):214-216.
[2] GOSPODNETIC O. Parsing, indexing, and searching XML with digester and Lucene[J]. Interchange, 200, 9(2):26-33.
[3] HATCHER E, GOSPODNETIC O. Lucene in action[M]. Shelter Island, NY: Manning Publications Co., 2005.
[4] 刘小珠,彭智勇. 全文索引技术时空效率分析[J]. 软件学报,2009,20(7):1768-1784. LIU Xiaozhu, PENG Zhiyong.Time and space efficiencies analysis of full-text index techniques[J]. Journal of Software, 2009, 20(7):1768-1784.
[5] 蒋维,郝文宁,杨晓恝,等. 分布式数据库搜索引擎的索引建立和优化[J]. 计算机工程,2008,34(18): 36-38. JIANG Wei, HAO Wenning, YANG Xiaojia, et al. Index creation and optimization of distributed database search engine[J].Computer Engineering, 2008, 34(18):36-38.
[6] 冯汝伟,谢强,丁秋林. 基于文本聚类与分布式Lucene的知识检索[J]. 计算机应用,2013,33(1):186-188. FENG Ruwei, XIE Qiang, DING Qiulin. Knowledge retrieval based on text clustering and distributed Lucene[J]. Journal of Computer Applications, 2013, 33(1):186-188.
[7] 彭哲,陈敬文. Lucene全文检索的应用及检索效率测试研究[J]. 图书馆学研究,2009(2): 37-40. PENG Zhe, CHEN Jingwen. Study on application of full text retrieval based on Lucene and retrieval efficiency test[J]. Researches in Library Science, 2009(2):37-40.
[8] 孙志军,郑烇,袁婧,等. 基于浅层语义分析技术的语义检索[J]. 计算机科学,2012,39(6):107-110. SUN Zhijun, ZHENG Quan, YUAN Jing, et al. Semantic retrieval based on shallow semantic analysis technology[J]. Computer Science, 2012, 39(6):107-110.
[9] 王欢,孙瑞志. 基于领域本体和Lucene的语义检索系统研究[J]. 计算机应用,2010,30(6):1655-1657. WANG Huan, SUN Ruizhi. Research of semantic retrieval system based on domain-ontology and Lucene[J].Journal of Computer Applications, 2010, 30(6):1655-1657.
[10] 吴众欣, 沈家立.Lucene分析与应用[M]. 北京:机械工业出版社,2008. WU Zhongxin, SHEN Jiali.Lucene analysis and application [M]. Beijing: China Machine Press, 2008.
[11] 车庆男.基于Lucene的索引系统分析和研究[J]. 内蒙古石油化工,2010,36(18):7-8. CHE Qinnan.Analysis and research of index system based on Lucene[J]. Inner Mongulia Petrochemical Industry, 2010, 36(18):7-8.
[12] 潘胜一. 基于倒排索引的压缩算法性能研究[D]. 杭州:杭州电子科技大学,2009. PAN Shengyi. A study on compression algorithm performance based inverted index [D]. Hangzhou: Hangzhou Dianzi University, 2009.
[1] WANG Kai, HONG Yu, QIU Ying-ying, WANG Jian, YAO Jian-min, ZHOU Guo-dong. Study on boundary detection of users query intents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 13-18.
[2] SU Bin-ting, XU Li, FANG He, WANG Feng. Fast authentication mechanism based on Diffie-Hellman for wireless mesh networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 101-105.
[3] CAO Rong, HUANG Jin-zhu, YI Mian-zhu. Information retrieval: the final direction of human language technology research in DARPA [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 11-17.
[4] MENG Ye, ZHANG Peng, SONG Da-wei. Study on collection statistics for parameter selection in pseudo relevance feedback [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(7): 18-22.
[5] XU Jie-ping1, YIN Hong-yu1, FAN Zi-wen2. Study on cover songs identification based on phrase content [J]. J4, 2013, 48(7): 68-71.
[6] YU Li. ε-strongly subdifferential of set-valued mapping and application [J]. J4, 2013, 48(3): 99-105.
[7] LI Wei, XU Wen-feng, LI Hong-yu. Study on fuzzy DEA model based on independent subsystems [J]. J4, 2012, 47(9): 78-83.
[8] GUO Xiao-dong1, DU Peng1, ZHANG Xue-fen2. A energy-efficient distributed detection and power allocation algorithm in wireless sensor networks [J]. J4, 2012, 47(9): 60-64.
[9] SUN Jing-yu, CHEN Jun-jie, YU Xue-li, LI Xian-hua. A survey of collaborative Web search [J]. J4, 2011, 46(5): 9-15.
[10] CUI Yu-quan1, MA Li-jie2, ZHAO Jing3, BAI Jin-yan4. Application of DEA method on identifying a portfolio [J]. J4, 2011, 46(2): 82-88.
[11] ZHOU Xiao-shuang1,2. Relative efficiencies of Bayes estimator and generalized least square estimator under misspecified prior assumption [J]. J4, 2010, 45(9): 70-73.
[12] PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling. A multi-level clustering approach based on noun phrases for search results [J]. J4, 2010, 45(7): 39-44.
[13] LI Gui-qing, GAO Zhong-he, WANG Nan-nan. A directed diffusion protocol based on cluster head-set for wireless sensor networks [J]. J4, 2010, 45(11): 37-42.
[14] SI Xiao-hui,YUE Qin-yan*,GAO Bao-yu,WANG Xiao-na,WEI Jin-cheng,LULei . Synthesis of P(DMDAAC-AM) as cationic polymeric flocculants by dispersion polymerization [J]. J4, 2008, 43(1): 28-32 .
[15] MA Li-jie,DENG Wei,ZHAO Yi-jun . Ranking DMUs under interval DEA model [J]. J4, 2006, 41(6): 61-64 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!