Skip to content
xaprb edited this page Sep 14, 2010 · 6 revisions

Overview

rtime is a passive TCP response time analysis tool. It watches traffic to and from a specified port and measures the time elapsed from an incoming request to the outgoing response. It periodically prints out a report on the number and response time of these requests.

Major Functionality

The functionality is best explained with an example. To watch MySQL’s traffic on port 3306 (a hand-generated example, not real),

# rtime --port 3306 --iterations 2
        ts count     sum    max  min  avg  med stdev 95_max 95_avg 95_med 95_stdev 99_max 99_avg 99_med 99_stdev
1274149776   123  812928 128193 1293 6609 6003   182  11023   5012   4938      102  83223   6038   5832      138
1274149786  1829 1929823 122933 1391 1055 5823   198  12293   5219   5023      130  67230   5829   5223      132

The columns are as follows. Except for the first two columns, everything is measured in microseconds, and pertains to the sample gathered during the time since the last line was printed.

  • ts is the Unix timestamp at the end of the period.
  • count is how many requests were captured during the period.
  • sum is the total response time (referred to as R after this, again, in microseconds).
  • max is the maximum R.
  • min is the minimum R.
  • avg is the total divided by the count.
  • med is the median — the middle of the sorted list.
  • stdev is the sample standard deviation.
  • 95_max, 95_avg, 95_med, 95_stdev are the respective metrics over the population with the largest 5% discarded.
  • 99_max and similar are over the population with the largest 1% discarded.

Modes

The mode shown above is the default mode, which we’ll call “table mode”. We also want two more modes:

Sliding-window Mode

This prints out average response time in microseconds over user-specified intervals. For example,

# rtime --port 3306 --windows 1,60,300
123 181 153

This means that during the last second the average response time was 123 microseconds; over the last minute it averaged 181 microseconds; and over the last 5 minutes it averaged 153 microseconds. This automatically sets —interval to 1 (the smallest item in the —windows option) and sets —iterations to 0 so it runs forever.

This will be useful for writing shell scripts and other tools to react to sudden changes in the response time. If the shell script suddenly sees that the response time during the short interval is much different from the averages over longer periods, it can gather diagnostic data about what’s happening.

Counter Mode

This prints out the sum of response time seen during the entire execution, in units of seconds. This will be useful for graphing response time with Cacti and similar tools. For example,

# rtime --port 3306 --interval 1 --iterations 0 --counter
8
11
19
29

From this we can see that during the first second there were 8 seconds of total response time, then during the next second there were 3 more, then 8 more, then 10 more after that, and so on. This can be used as a GAUGE in an RRD file. The —file option will be useful for this.

Options

  • —port which port to watch
  • —interval (default 10) how often, in seconds, to print a line
  • —iterations (default 1) how many reports to print out; 0 means infinity
  • —vertical prints the output in name: value\n format, with a blank line after each output
  • —file prints the output to a file. Each iteration the file is opened, truncated, printed, and closed. If append-only is needed, the script can be started with standard output redirected to the file.

Further Ideas

  • Can we account for the outstanding requests that haven’t yet been answered? This will be very useful for detecting more real-time performance, which is needed for figuring out when a stall is happening so a script can be fired to gather diagnostic data. For example, if a number of requests are made, stalled, and then the end of the reporting interval passes, include these requests into the statistics as though they had just responded at the instant the interval ended.
  • Write an init script so this can be run as a daemon with a pid file and all the usual other goodies.

Uses

  1. Run continuously to see performance, like vmstat or iostat.
  2. Continually update a file so a monitoring or graphing script like Cacti can gather statistics and build graphs.
  3. Run as part of a script designed to react to changes in response time, so diagnostics can be gathered.
Clone this wiki locally