-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very long time to write skims to shared data buffer when using sharrow #8
Comments
Whoa that is crazy. To confirm: This is using the skim OMX files encoded with I am curious if the load times are better or worse or same if you use the "original" OMX-standard |
These results are from the "original" OMX-standard skims. And yes, the same OMX files were used in the non-sharrow run. Perhaps related, it was taking a ton of time to unpack the Wondering if @i-am-sijia has had similar issues which might point it to a windows vs mac/linux thing? |
Yes, I'm seeing the same slow unpacking. The exercise.py downloaded all 16 v0.2.0 release assets successfully. It's now unpacking them into .\sandag-abm3-example\data_full. It took 1 hour to just go from “reading part000” to “reading part001”, and another hour to "reading part002". I suppose it is going to take >10 hours to finish unpacking. |
|
CS has a similar result, loading skims into shared memory took about 12 minutes (using blosc2:zstd) on a Windows server. I believe performance on this step is very hardware-specific, and hinges a lot on the bandwidth between the "disk" (typically SSD) and the CPU/RAM. We're loading 140GB of data from disk into memory, if the read speed of the disk is limited then this step is going to take a bit.
On my MacBook Pro the load time is only 48 seconds, but I've got a very fast machine and the SSD read speed is upwards of 5GB/sec. Let's not worry about "unpack" times for now, that's not a real ActivitySim function just a bit of help code for downloading data for testing. We can just stick with sharepoint for large file service in the now, and solve that problem later. |
Running with the This seems to support our theory that there is an inefficiency causing very slow read times with the original |
Ran with the original skims on my machine, "writing skims to shared memory" took about 16 hours, even longer than the 7.5 hours reported by David. |
To attempt to replicate the memory profiling result I show above on a different machine, you'll need to update sharrow to version 2.8.3, which adds the ability to pick the dask scheduler to use for skim loading, and then use code from https://github.com/camsys/activitysim/tree/skim-read which makes use of that power to be able to select the synchronous scheduler. |
sharrow (v2.8.3): 29aaf6cf561e027587b3b2eb2bb152e0134db8b0 I ran the single-threaded skim (with |
This pattern looks quite different than what @jpn-- profiled earlier in this issue which was flat and low, then kinda flat and much higher with extreme intermittent spikes. |
I tried with the cropped skims. There doesn't seem to be any noticeable difference between them: With blosc2 compression takes 19 seconds
With zlib compression takes 18 seconds
The load time difference appears to be size dependent.... |
Profiled skim loading without sharrow. activitysim (activitysim/main): 50c05c50b11ed7436d376ac8f4df33baa46af5f7 The sharrow off mode loads skims sequentially. On a 512 GB, 2.44 GHz windows machine, it took 20 mins to load 1755 individual skim cores. The average load time per skim is about 0.7 seconds (max 1.2 seconds). I do not see an irregular pattern. |
In the sharrow off mode, there is an However, when I looked at the memory profile output, I did not see a large memory being created and hold at the allocate_skim_buffer step. I only saw memory gradually picking up as the skims being loaded. I then ran the model again, this time I paused the run at the |
CS has re-run the experiments, over the weekend when usage of the same server by other projects is limited. It appears our performance irregularities may be caused by resource competition from other unrelated processes running on the same machine. In the re-run, we observe much better performance overall, as well as an anomaly in the middle of the blosc loading (orange) that is likely due to resource competition Total load time for zlib skims was about 2 hours, and for blosc skims about 30 minutes. Also, @i-am-sijia replicated the same experiments on WSP's server, which is apparently faster. She observed similar patterns, with no anomaly. Total load time for zlib skims was about 25 minutes, and for blosc about 9 minutes. |
Tried with the SEMCOG model which has a skim file of 2.3 GB on disk and 65 GB in memory. In a non-sharrow run, skim load times are about 3.5 mins. With sharrow turned on, that jumps to about 14 mins. This is using the standard zlib skims. Converting the skims to blosc2 format using wring took 7.7 minutes. Trying again with the blosc2 skims took 3 minutes. (Tangentially related -- the SEMCOG model doesn't have the landuse data sorted by MAZ by default. This can be fixed easily in the code by sorting the dataframe before the sharrow check, as seen in this commit: ActivitySim/activitysim@96d4bb6) |
Update on the abm3 tests
None of the tests showed a significant improvement of the skim loading time in ActivitySim. |
Describe the bug
It took about 7.5 hours to load the skims into the shared data buffer. This has happened on multiple machines now.
The attached timing and activitysim log show the very long runtimes.
timing_log.csv
activitysim.log
The skim load times for non-sharrow runs are the roughly 25 mins as expected for this model. Additionally, running the MTC example model in sharrow compile mode does not show this same behavior. A log file for that run shows the "activitysim.core.skim_dataset - writing skims to shared memory" step takes just a couple of seconds.
To Reproduce
Run in sharrow compile mode on a windows server.
Expected behavior
Skim load times should be roughly comparable to non-sharrow mode.
cc: @jpn--, @i-am-sijia
The text was updated successfully, but these errors were encountered: