JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2018, Vol. 53 ›› Issue (3): 54-62.doi: 10.6040/j.issn.1671-9352.1.2017.040

Previous Articles     Next Articles

Efficient multiple sets intersection using SIMD instructions

SONG Xing-shen1, YANG Yue-xiang1, JIANG Yu2   

  1. 1. College of Computer, National University of Defense Technology, Changsha 410000, Hunan, China;
    2. Northwest Institute of Nuclear Technology, Xian 710024, Shaanxi, China
  • Received:2017-07-04 Online:2018-03-20 Published:2018-03-13

Abstract: Conjunctive Boolean query is one fundamental operation for document retrieval and widely used in many information systems and databases. In its most basic and popular form, a conjunctive query can be seen as the intersection problem of multiple sets of sorted integers, and how to improve its efficiency is becoming one important research highlight. Based on the traditional intersection algorithms, this paper proposes two optimizations on the essential searching algorithms using SIMD instructions. The optimized search algorithms are able to be adopted into various multiple sets intersection methods while improving intersection efficiency. Experiments show that the optimized algorithms performs much better than the traditional ones, even outperform the recent SIMD intersection algorithms,and the improvement is up to 37.3% at most.

Key words: inverted index, vectorized processing, performance evaluation, set intersection

CLC Number: 

  • TP301
[1] CULPEPPER J S, MOFFAT A. Efficient set intersection for inverted indexing[J]. ACM Transactions on Information Systems, 2010, 29(1):1-25.
[2] ZOBEL J, MOFFAT A. Inverted files for text search engines[J]. ACM Computing Surveys, 2006, 38(2):6.
[3] BOŽA V. Experimental comparison of set intersection algorithms for inverted indexing[C/OL] // ITAT Proceedings, CEUR Workshop Proceedings. 2013, 1003: 58-64.[2017-02-05].http://www.ceur-ws.org/Vol-1003/58.pdf
[4] SANDERS P, TRANSIER F. Intersection ininteger inverted indices[C] //Proceedings of the 9th Workshop on Algorithm Engineering and Experiments/4th Workshop on Analytic Algorithmics and Combinatorics.Philadelphia:SIAM, 2007, Article 71.
[5] BAEZA-YATES R. Afast set intersection algorithm for sorted sequences[C] // Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching. Berlin: Springer-Verlag, 2004:400-408.
[6] JÉRÉMY BARBAY, LU T, SALINGER A. An experimental investigation of set intersection algorithms for text searching [J]. Journal of Experimental Algorithmics, 2009, 14:37. DOI:10.1145/1498698.1564507.
[7] BLELLOCHG E, REID-MILLER M. Fast set operations using treaps[C] // Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures. New York: ACM, 1998:16-26.
[8] NAVARROG, PUGLISI S J. Dual-sorted inverted lists[C] // Proceedings of the 17th International Conference on String Processing and Information Retrieval. Berlin: Springer-Verlag, 2010:309-321.
[9] CULPEPPER J S, MOFFAT A. Compact set representation for information retrieval[C] //Proceedings of the 14th International Symposium on String Processing and Information Retrieval. Berlin: Springer-Verlag, 2007:137-148.
[10] DING B. Fast set intersection in memory[J]. Proceedings of the VLDB Endowment, 2011, 4(4):255-266.
[11] AO N, ZHANG F, WU D, et al. Efficient parallel lists intersection and index compression algorithms using graphics processing units[J]. Proceedings of the Vldb Endowment, 2011, 4(8):470-481.
[12] TATIKONDAS, JUNQUEIRA F, CAMBAZOGLU B B, et al. On efficient posting list intersection with multicore processors[C] // Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009:738-739.
[13] TAKUMA D, YANAGISAWA H. Faster upper bounding of intersection sizes[C] //International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013:703-712.
[14] 闫宏飞, 张旭东, 单栋栋,等. 基于指令级并行的倒排索引压缩算法[J]. 计算机研究与发展, 2015, 52(5):995-1004. YAN Hongfei, ZHANG Xudong, SHAN Dongdong.SIMD-based inverted index compression algorithms[J]. Journal of Computer Research and Development, 2015, 52(5):995-1004.
[15] DEMAINE E D, LÓPEZ-ORTIZ A, MUNRO J I. Adaptive set intersections, unions, and differences[C] // Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms. New York: ACM, 2000:743-752.
[16] DEMAINE E D, LÓPEZ-ORTIZ A, MUNRO J I. Experiments on adaptive set intersections for text retrieval systems[C] //Proceedings of Revised Papers from the 3rd International Workshop on Algorithm Engineering and Experimentation. London: Springer-Verlag, 2001:91-104.
[17] BARBAY J, PEZ-ORTIZ A, LU T. Faster adaptive set intersections for text searching[C] //Proceedings of the 5th International Workshop on Experimental Algorithms(WEA 2006). Berlin: Springer-Verlag, 2006: 146-157.
[18] SCHLEGEL B, GEMULLA R, LEHNER W. Fast integer compression using SIMD instructions[C] // Proceedings of the 6th International Workshop on Data Management on New Hardware(DaMoN'10).New York: ACM, 2010:34-40.
[19] LEMIRE D, BOYTSOV L, KURZ N. SIMD compression and the intersection of sorted integers[J]. Software Practice & Experience, 2014, 46(6):723-749.
[20] INOUE H, OHARA M, TAURA K. Faster set intersection with SIMD instructions by reducing branch mispredictions[J]. Proceedings of the Vldb Endowment, 2014, 8(3):293-304.
[1] MAO Fu-lin, QU You-li. An variable length code algorithm compression inverted index [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(12): 30-35.
[2] WANG Jian1,2, GUO Li-li1, LI Yang2. Study on formal modeling method for survivability of mission-critical systems [J]. J4, 2011, 46(9): 89-94.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!