局部敏感哈希LSH原作者的论文和程序LSH Algorithm and Implementation (E2LSH)

来源:互联网 发布:it招标 编辑:程序博客网 时间:2024/06/03 15:59
LSH Algorithm and Implementation (E2LSH)


Locality-Sensitive Hashing (LSH) is an algorithm for solving the approximate or exact Near Neighbor Search in high dimensional spaces. This webpage links to the newest LSH algorithms in Euclidean and Hamming spaces, as well as the E2LSH package, an implementation of an early practical LSH algorithm. 

  • Algorithm description: 

    • Newest (not quite) LSH algorithms (2014): These algorithms achieve performance better than the classic LSH algorithms by usingdata-dependent hashing. They improve over classic LSH algorithms for both Hamming and Euclidean space. These algorithms are not dynamic however, in contrast to the classic LSH algorithms, which use data-independent hashing and hence allow updates to the pointset. 

      Optimal Data-Dependent Hashing for Approximate Near Neighbors (by Alexandr Andoni and Ilya Razenshteyn). In STOC'15 (to appear). Full version in arXiv:1501.01062. 

      Beyond Locality Sensitive Hashing (by Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn). In SODA'14
      Slides: Here are some slides by Alexandr Andoni on the early version from SODA'14. 

    • Survey of LSH in CACM (2008): <a "Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions" (by Alexandr Andoni and Piotr Indyk). Communications of the ACM, vol. 51, no. 1, 2008, pp. 117-122. (CACM disclaimer). 
      also available directly from CACM (for free). 

    • Most Not so recent algorithm for Euclidean space (2006): "Near-Optimal Hashing Algorithms for Near Neighbor Problem in High Dimensions" (by Alexandr Andoni and Piotr Indyk). In FOCS'06

      Slides on this LSH algorithm from a talk given by Piotr Indyk. 

    • Earlier algorithm for Euclidean space (2006): a good introduction to LSH, and the description of affairs as of 2006, is in the following book chapter 

      Locality-Sensitive Hashing Scheme Based on p-Stable Distributions (by Alexandr Andoni, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni), appearing in the book Nearest Neighbor Methods in Learning and Vision: Theory and Practice, by T. Darrell and P. Indyk and G. Shakhnarovich (eds.), MIT Press, 2006. 

      See also the book introduction for a smooth introduction to NN problem and LSH. 

    • Original LSH algorithm (1999): the best algorithm for the Hamming space remains previous version of the algorithm for the Hamming distance is described in [GIM'99] paper.


  • Implementation of LSH: download the E2LSH package (alpha-version). The code is based on the algorithm described in the book chapter (2006) from above. You can download the manual for the code. The code has been developed by Alex Andoni in 2004-2005. 

This research was supported in part by NSF CAREER Grant #0133849 "Approximate Algorithms for High-dimensional Geometric Problems". 



from: http://web.mit.edu/andoni/www/LSH/index.html

0 0