-
Notifications
You must be signed in to change notification settings - Fork 61
Communication Model
First, a number of benchmark tests was dry-ran, with the following parameters:
- name: "test"
- num-cells-per-rank: cpr
- num-ranks: nr
- num-synapses: 1000
- duration: 100
- min-delay: 10
- spike-frequency: 100
- realtime-ratio: 0.1
The parameters nr and cpr take values from the sets (1, 4, 10, 40, 100, 400, 1000, 4000, 10000) and (100, 250, 1000, 2500, 10000), respectively, which yields a total of 9*5=45 tests. The times t all the programs spent in the walkspikes process were measured and a graph t(nr*cpr) was plotted. For visibility and distinguishability, logarithms were taken for both the input and output data. From this graph, the data sets corresponding to test runs with the same cpr were identified and the process was repeated for those with same nr, with the following results:
The preliminary goal was to fit all those lines linearly in order to create an appropriate function which would return the expected execution time, given cpr and nr as a final product. After fitting the mentioned data sets and analyzing the linear functions, two things have been observed:
- The slope of the lines has no visible correlation with cpr on nr (on appropriate graphs) - thus, it was concluded that the slope is constant (c1 for the same-cpr graph and r1 for the same-nr graph, both taken as the mean of the slopes), albeit subject to fluctuations.
- The intersections of the functions with the Y-axis seem to be somewhat equidistant - it was assumed that they are linear with log(cpr) and log(nr), appropriately.
In order to tackle Y-axis intersections, their coordinates were plotted against log(nr) (likewise for log(cpr)):
These plots provided another two pairs of constants, describing the linear functions used on the graphs:
- c21 for the new function's slope, same-cpr graph
- c22 for the new function's intersection with Y-axis, same-cpr graph
- r21 for the new function's slope, same-nr graph
- r22 for the new function's intersection with Y-axis, same-nr graph
Now the fits on the initial two graphs are parametrized - for given nr, cpr, and p corresponding to the X-axis value:
- The lines connecting measurements with same nr are fitted as: r1/p+r21log(nr)+r22
- The lines connecting measurements with same cpr are fitted as: c1/p+c21log(cr)+c22 The intersections of these parametrized functions are illustrated in the image below:
They provide us with the values of p corresponding to the given nr and cpr, which are then plugged in either of the parametrized functions in order to get the appropriate Y-axis value. This is the logarithm of the expected execution time, and thus the final expression (in Wolfram Mathematica) is:
These results were compared to the measured ones in the following graph, where blue dots represent the measured execution times, whereas the red ones represent the estimates:
Despite the two sets of points being clearly correlated, the error of the model increases when we go back to the original data values, instead of the logarithms. This is neatly summarized in a table:
In this model, our simulation is considered to be ran on only one rank (nr = 1) and time is modelled as a function of ns and cpr. At first, an approach similar to the previous model was followed, where data points were grouped by one of their arguments (slicing). Afterwards, a two-variable fit was computed based on the expected behaviour of the program, as well as conclusions from slicing.
Three steps of spike exchange should be considered in order to calculate the complexity expectation:
- Sorting the spikes from the previous epoch - O(sp * log(sp)) = O(cpr * nr * log(cpr * nr))
- Looking for events caused by the spikes in local connection tables - O(sp * log(cpr * ns)) (note: here, instead of ns, the number of connections per cell should be used, albeit in this model the two are equivalent)
- Merge event cues with pending events for each cell - O(sp), roughly
The slicing plots are shown in the images below, taking only a fraction of all the data points for distinguishability.
For the first plot (points grouped by same ns), a linear fit seems to be adequate, with very nice and predictable behaviour of its coefficients (not presented for conciseness). The second plot is fitted as a sum of a linear and a logarithmic term. This assumption is according to the expectations, since before taking the logarithm of the data set, an n*log(n) dependence was expected. This produced three new plots, describing the behaviour of the terms' coefficients for different cpr:
All three parameters exhibit a change of behaviour at log CPR ≈ 3.1. This is attributed the lowest cache memory being filled up.
Next, a new set of data was obtained by running the benchmark on Piz Daint. It was directly fitted using the NonLinearModelFit function in Wolfram Mathematica. The aforementioned change in behaviour is handled using a step function, with the location of the step also being a parameter Mathematica is looking for, with the condition that the fit function is continuous. It has been calculated that it cannot be expected for the derivative of the function to be continuous as well, since it will, due to the nature of the model, produce two identical functions on both sides of the step function, that is, no change in behaviour at all. The fit, while being quite close to the original data, doesn't yield expected results as to the kink, as the program predicts that it lies at CPR ≈ 5000 and forcibly limiting it inside the interval of [1000, 2000] produces much worse results. The fit (navy blue) and the measured data (bright orange) are plotted in the following image: