Sharing memory based Hash Table extension for Python
BE CAREFUL: this package is not for general purpose usage, it only accepts key < max_key_size
and values < max_value_size
. And although it's sharing memory based, it DO NOT use locks to avoid concurrency problem. It was designed for a former project, which had a write process, and many read process after the writer finished.
For examples, see test cases in python files (pyshmht/Cacher.py, pyshmht/HashTable.py), where you can find performance tests as well.
capacity=200M, 64 bytes key/value tests, tested on (Xeon E5-2670 0 @ 2.60GHz, 128GB ram)
- hashtable.c (raw hash table in c, tested on
malloc
ed memory)
set: 0.93 Million iops;
get: 2.35 Million iops;
- performance_test.py (raw python binding)
set: 451k iops;
get: 272k iops;
- HashTable.py (simple wrapper, no serialization)
set: 354k iops;
get: 202k iops;
- Cacher.py (cached wrapper, with serialization)
set: 501k iops (cached), 228k iops (after write_back);
get: 560k iops (cached), 238k iops (no cache);
- python native dict
set: 741k iops;
get: 390k iops;
In hashtable.c, default max key length is 256 - 4
, max value length is 1024 - 4
; you can change bucket_size
and max_key_size
manually, but bear in mind that increasing these two arguments will result in larger memory consumption.
If you find any bugs, please submit an issue or send me a pull request, I'll see to it ASAP :)
p.s. hashtable.c
is independent (i.e. has nothing to do with python), you can use it in other projects if needed. :P