Memcached Peak Load Tuning (Red lining)
In order to produce a sensitivity profile, Swan needs to know the peak load for Memcached on SUT machine. From the peak load, Swan computes load points from 5% to 100% of node capacity.
Memcached should be able to serve roughly 100k-200k QPS per thread.
Automatic Peak Load Tuning
Set SWAN_PEAK_LOAD
to 0
. Swan will try to find maximum capacity by it's own and then run experiment.
The results from automatic tuning are not always stable and correct.
Manual Tuning
Pick any peak load and run baseline with multiple load points with no aggressors and explore the results in Jupyter. Please run the experiment few times to see if the results are stable.
# Best Effort workloads should be left empty.
EXPERIMENT_BE_WORKLOADS=
# Pick high number.
EXPERIMENT_PEAK_LOAD=1000000
# With peak load equal to 1M, each load point will be equal to 50k RPS
# Load Points would be: 50k, 100k, 150k, ..., 1M
EXPERIMENT_LOAD_POINTS=20
# Longer load might return more stable results.
EXPERIMENT_LOAD_DURATION=60s
EXPERIMENT_SLO=500
Swan use mutilate as its load generator for memcached. It has previously been used in published latency studies for memcached and is a distributed high-performance load generator.
When running mutilate in a distributed setup, the master process (or coordinator) connects to the target memcached instance in conjunction with the load being generated by the agents. Below, the master connection is highlighted with green and the main load from the agents is highlighted with red.
In this way, the mutilate master continuously communicates the target per-agent load, gets achieved load back and performs latency readings on samples connections it establish to the Memcached instance directly.
To obtain reasonable results, Mutilate author suggests following configuration for Mutilate Agents:
- Establish on the order of 100 connections per memcached server thread.
- Don't exceed more than about 16 connections per mutilate thread. (
MUTILATE_AGENT_CONNECTIONS
flag) - Use multiple mutilate agents in order to achieve (1) and (2). (
EXPERIMENT_MUTILATE_AGENT_ADDRESSES
flag) - Do not use more mutilate threads than hardware cores/threads. (
MUTILATE_AGENT_THREADS
flag) - Use -Q to configure the "master" agent to take latency samples at slow, a constant rate. (
EXPERIMENT_MUTILATE_MASTER_ADDRESS
flag)
In brackets, there are listed Swan Configuration Flags that are responsible for each setting.
The math for establishing first point is as follows:
Mutilate Connections = number of agents x number of agent threads x agent connection count
Memcached Connections Requirement = 100 * `MEMCACHED_THREADS`
'Mutilate Connections' should be more or equal 'Memcached Connections Requirement'
In essence, it is unfortunately very easy to see high latency measurements due to unintended interference and client side queuing in mutilate.
On top of the recommendations, we have found that reducing the number agent threads and connections and increasing the measurement time to around 30 seconds with the --load_duration
flag helps.
To accommodate for the fewer connections per agent, you should add more agents.
You are ready to go!
For further debugging, please see Troubleshooting page.