-
Notifications
You must be signed in to change notification settings - Fork 10
Using Storage
Storage
is an interface that we use to abstract away various filesystems and cloud providers. You give it a provider layer path, and then you can download or upload files relative to that path.
Storage
provides (python) multithreading capability to accelerate uploads and downloads on http1 connections. You can set the number of threads to use. 0 threads means run everything on the main program thread. If you use too many crashed (between 64 to 128 on my machine) it will crash.
We tested get_files
on a dual core (NB: python threads only use a single core) 2014 Macbook Pro, 2.4 GHz on a decent wireless connection.
The version tested was commit 26b3606240ca66d7dbe6def33aab4dba7bb316be
Service | Threads | Time (sec) |
---|---|---|
file | 0 | 0.0036 |
file | 2 | 0.0039 |
file | 4 | 0.0037 |
file | 8 | 0.0053 |
file | 16 | 0.0045 |
file | 32 | 0.0058 |
file | 64 | 0.0070 |
gs | 0 | 27.8455 |
gs | 1 | 10.5758 |
gs | 2 | 4.9513 |
gs | 4 | 2.5868 |
gs | 8 | 1.4941 |
gs | 16 | 0.9418 |
gs | 32 | 0.7500 |
gs | 64 | 0.6997 |
S3 | 0 | 10.0914 |
S3 | 1 | 1.6661 |
S3 | 2 | 0.9482 |
S3 | 4 | 0.6604 |
S3 | 8 | 0.5300 |
S3 | 16 | 0.2337 |
S3 | 32 | 0.2419 |
S3 | 64 | 0.4772 |
The code used to generate the tests is listed below. The command to run the test is:
py.test -s -v python/test/test_storage.py
def test_performance():
def run(url, num_threads):
s = Storage(url, n_threads=num_threads)
content = 'some_string'
s.put_file('info', content, compress=False)
s.wait_until_queue_empty()
start = time.time()
s.get_files([ 'info' for i in xrange(50) ])
end = time.time()
s._kill_threads()
return end - start
urls = [
"file:///tmp/removeme/read_write",
"gs://neuroglancer/removeme/read_write",
"s3://neuroglancer/removeme/read_write"
]
for url in urls:
n_threads = [ 0 ] + [ 2 ** i for i in xrange(0,7) ]
for num in n_threads:
delta = run(url, num)
print url, num, delta