山东大学学报(理学版) ›› 2018, Vol. 53 ›› Issue (3): 54-62.doi: 10.6040/j.issn.1671-9352.1.2017.040
宋省身1,杨岳湘1,江宇2
SONG Xing-shen1, YANG Yue-xiang1, JIANG Yu2
摘要: 布尔查询中的求交操作被广泛应用于各种信息系统中,是进行文档检索的基本操作之一。其基本形式可以视作多个有序整数序列的交集问题,而提高求交运算的效率是当前研究的重点。在传统求交算法的基础上,利用单指令多数据流(single instruction multiple data, SIMD)并行指令集,针对其核心的搜索步骤,提出了两种基于SIMD的跳跃式搜索算法。该算法在提高性能的同时,能有效适配在传统多倒排链求交算法中。实验证明,优化后的算法相比未使用SIMD的情况下有了很大的提升,甚至优于SIMD优化后的两两相交算法,性能最高提升37.3%。
中图分类号:
[1] CULPEPPER J S, MOFFAT A. Efficient set intersection for inverted indexing[J]. ACM Transactions on Information Systems, 2010, 29(1):1-25. [2] ZOBEL J, MOFFAT A. Inverted files for text search engines[J]. ACM Computing Surveys, 2006, 38(2):6. [3] BOŽA V. Experimental comparison of set intersection algorithms for inverted indexing[C/OL] // ITAT Proceedings, CEUR Workshop Proceedings. 2013, 1003: 58-64.[2017-02-05].http://www.ceur-ws.org/Vol-1003/58.pdf [4] SANDERS P, TRANSIER F. Intersection ininteger inverted indices[C] //Proceedings of the 9th Workshop on Algorithm Engineering and Experiments/4th Workshop on Analytic Algorithmics and Combinatorics.Philadelphia:SIAM, 2007, Article 71. [5] BAEZA-YATES R. Afast set intersection algorithm for sorted sequences[C] // Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching. Berlin: Springer-Verlag, 2004:400-408. [6] JÉRÉMY BARBAY, LU T, SALINGER A. An experimental investigation of set intersection algorithms for text searching [J]. Journal of Experimental Algorithmics, 2009, 14:37. DOI:10.1145/1498698.1564507. [7] BLELLOCHG E, REID-MILLER M. Fast set operations using treaps[C] // Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures. New York: ACM, 1998:16-26. [8] NAVARROG, PUGLISI S J. Dual-sorted inverted lists[C] // Proceedings of the 17th International Conference on String Processing and Information Retrieval. Berlin: Springer-Verlag, 2010:309-321. [9] CULPEPPER J S, MOFFAT A. Compact set representation for information retrieval[C] //Proceedings of the 14th International Symposium on String Processing and Information Retrieval. Berlin: Springer-Verlag, 2007:137-148. [10] DING B. Fast set intersection in memory[J]. Proceedings of the VLDB Endowment, 2011, 4(4):255-266. [11] AO N, ZHANG F, WU D, et al. Efficient parallel lists intersection and index compression algorithms using graphics processing units[J]. Proceedings of the Vldb Endowment, 2011, 4(8):470-481. [12] TATIKONDAS, JUNQUEIRA F, CAMBAZOGLU B B, et al. On efficient posting list intersection with multicore processors[C] // Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009:738-739. [13] TAKUMA D, YANAGISAWA H. Faster upper bounding of intersection sizes[C] //International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013:703-712. [14] 闫宏飞, 张旭东, 单栋栋,等. 基于指令级并行的倒排索引压缩算法[J]. 计算机研究与发展, 2015, 52(5):995-1004. YAN Hongfei, ZHANG Xudong, SHAN Dongdong.SIMD-based inverted index compression algorithms[J]. Journal of Computer Research and Development, 2015, 52(5):995-1004. [15] DEMAINE E D, LÓPEZ-ORTIZ A, MUNRO J I. Adaptive set intersections, unions, and differences[C] // Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms. New York: ACM, 2000:743-752. [16] DEMAINE E D, LÓPEZ-ORTIZ A, MUNRO J I. Experiments on adaptive set intersections for text retrieval systems[C] //Proceedings of Revised Papers from the 3rd International Workshop on Algorithm Engineering and Experimentation. London: Springer-Verlag, 2001:91-104. [17] BARBAY J, PEZ-ORTIZ A, LU T. Faster adaptive set intersections for text searching[C] //Proceedings of the 5th International Workshop on Experimental Algorithms(WEA 2006). Berlin: Springer-Verlag, 2006: 146-157. [18] SCHLEGEL B, GEMULLA R, LEHNER W. Fast integer compression using SIMD instructions[C] // Proceedings of the 6th International Workshop on Data Management on New Hardware(DaMoN'10).New York: ACM, 2010:34-40. [19] LEMIRE D, BOYTSOV L, KURZ N. SIMD compression and the intersection of sorted integers[J]. Software Practice & Experience, 2014, 46(6):723-749. [20] INOUE H, OHARA M, TAURA K. Faster set intersection with SIMD instructions by reducing branch mispredictions[J]. Proceedings of the Vldb Endowment, 2014, 8(3):293-304. |
[1] | 毛福林, 瞿有利. 一种变长编码压缩倒排索引算法[J]. 山东大学学报(理学版), 2014, 49(12): 30-35. |
|