-
Notifications
You must be signed in to change notification settings - Fork 66
Home
rtime is a passive TCP response time analysis tool. It watches traffic to and from a specified port and measures the time elapsed from an incoming request to the outgoing response. It periodically prints out a report on the number and response time of these requests.
The functionality is best explained with an example. To watch MySQL’s traffic on port 3306 (a hand-generated example, not real),
# rtime --port 3306 --iterations 2
ts count sum max min avg med stdev 95_max 95_avg 95_med 95_stdev 99_max 99_avg 99_med 99_stdev
1274149776 123 812928 128193 1293 6609 6003 182 11023 5012 4938 102 83223 6038 5832 138
1274149786 1829 1929823 122933 1391 1055 5823 198 12293 5219 5023 130 67230 5829 5223 132
The columns are as follows. Except for the first two columns, everything is measured in microseconds, and pertains to the sample gathered during the time since the last line was printed.
- ts is the Unix timestamp at the end of the period.
- count is how many requests were captured during the period.
- sum is the total response time (referred to as R after this, again, in microseconds).
- max is the maximum R.
- min is the minimum R.
- avg is the total divided by the count.
- med is the median — the middle of the sorted list.
- stdev is the sample standard deviation.
- 95_max, 95_avg, 95_med, 95_stdev are the respective metrics over the population with the largest 5% discarded.
- 99_max and similar are over the population with the largest 1% discarded.
The mode shown above is the default mode, which we’ll call “table mode”. We also want two more modes:
This prints out average response time in microseconds over user-specified intervals. For example,
# rtime --port 3306 --windows 1,60,300
123 181 153
This means that during the last second the average response time was 123 microseconds; over the last minute it averaged 181 microseconds; and over the last 5 minutes it averaged 153 microseconds. This automatically sets —interval to 1 (the smallest item in the —windows option) and sets —iterations to 0 so it runs forever.
This will be useful for writing shell scripts and other tools to react to sudden changes in the response time. If the shell script suddenly sees that the response time during the short interval is much different from the averages over longer periods, it can gather diagnostic data about what’s happening.
This prints out the sum of response time seen during the entire execution, in units of seconds. This will be useful for graphing response time with Cacti and similar tools. For example,
# rtime --port 3306 --interval 1 --iterations 0 --counter
8
11
19
29
From this we can see that during the first second there were 8 seconds of total response time, then during the next second there were 3 more, then 8 more, then 10 more after that, and so on. This can be used as a GAUGE in an RRD file. The —file option will be useful for this.
- —port which port to watch
- —interval (default 10) how often, in seconds, to print a line
- —iterations (default 1) how many reports to print out; 0 means infinity
- —vertical prints the output in name: value\n format, with a blank line after each output
- —file prints the output to a file. Each iteration the file is opened, truncated, printed, and closed. If append-only is needed, the script can be started with standard output redirected to the file.
- Can we account for the outstanding requests that haven’t yet been answered? This will be very useful for detecting more real-time performance, which is needed for figuring out when a stall is happening so a script can be fired to gather diagnostic data. For example, if a number of requests are made, stalled, and then the end of the reporting interval passes, include these requests into the statistics as though they had just responded at the instant the interval ended.
- Write an init script so this can be run as a daemon with a pid file and all the usual other goodies.
- Run continuously to see performance, like vmstat or iostat.
- Continually update a file so a monitoring or graphing script like Cacti can gather statistics and build graphs.
- Run as part of a script designed to react to changes in response time, so diagnostics can be gathered.