c++ simhash implementation for documents and an additional (prototyp) simhash index for text documents
- python3
- scons
- g++ (c++14)
- cpu with hardware aes,
cat /proc/cpuinfo | grep "aes" | wc -l
should be > 0
Just run scons
add a text document using simidx.py
:
# add one document
./simidx.py add textfile
# add a folder
./simidx.py add textfolder
# after you created an index you can query it with
./simidx.py query <document.txt>
For the approach and core idea have a look at papers in doc
.