This is a Rust command line tool that calculates a histogram of the separate types of JSON records in an input JSON log file (one JSON object per line).
A sample input file would be:
{"type":"B","foo":"bar","items":["one","two"]}
{"type": "A","foo": 4.0 }
{"type": "B","bar": "abcd"}
The output histogram would report a count of 2 for type B and 1 for type A. It would also report total of 73 bytes for type B and 26 for type A.
Git clone:
git clone https://github.com/dimitarvp/json-log-histogram-rust.git
cd json-log-histogram-rust
Compile:
RUSTFLAGS="-C target-cpu=native" cargo build --release
To test, generate a JSON log file and supply it as a command-line parameter:
./target/release/jlh -f /path/to/json/log/file
The tool prints an aligned text table and a total runtime at the bottom.
CPU | File size | Time in seconds |
---|---|---|
Xeon W-2150B @ 3.00GHz | 1MB | 0.11091947 |
Xeon W-2150B @ 3.00GHz | 10MB | 0.62043929 |
Xeon W-2150B @ 3.00GHz | 100MB | 0.643637170 |
Xeon W-2150B @ 3.00GHz | 1000MB | 5.175781744 |
i7-4870HQ @ 2.50GHz | 1MB | 0.07234297 |
i7-4870HQ @ 2.50GHz | 10MB | 0.68889124 |
i7-4870HQ @ 2.50GHz | 100MB | 0.670027735 |
i7-4870HQ @ 2.50GHz | 1000MB | 6.659739416 |
i3-3217U @ 1.80GHz | 1MB | 0.14369994 |
i3-3217U @ 1.80GHz | 10MB | 0.49248859 |
i3-3217U @ 1.80GHz | 100MB | 0.535957719 |
i3-3217U @ 1.80GHz | 1000MB | 3.773678079 |
- Using Rust
1.43.1
. - Using the rayon crate for transparent parallelization of the histogram calculation.
- Using the clap crate to parse the command line options (only one, which is the input JSON log file).
- Using the prettytable-rs crate to produce a pretty command line table with the results.
- Using
serde_json
to read each JSON record to a struct. - Skipped the ability to pipe files to the tool so it can read from stdin. The motivation was that
rayon
does not provide its.par_bridge
function to polymorphicBox<dyn BufRead>
objects (which is the common denominator ofstd::io::stdin().lock()
andstd::fs::File.open(path)
). I could have probably made it work but after 2 hours of attempts I realized that it might take a long time so I cut it short. - Used the
.lines()
function on theBufReader
even though that allocates a newString
per line. I am aware of the betterBufReader.read_line
idiom with a singleString
buffer (which is cleared after every line is consumed) and my initial non-parallel version even used it -- see this commit. But I couldn't find a quick way to translate this idiom to simply having something with the.lines()
function (rayon
expects anIterator
). I could have implementedIterator
for a wrapping struct or enum but, same as above, I was not sure if it will not take me very long. IMO even with that caveat the tool is very fast (see performance results table below). - The commit history got slightly botched because I had to use bfg to remove the 1MB / 10MB / 100MB / 1000MB JSON files that I added earlier (which I replaced with gzipped variants later).