Status Feb 1: The challenge has been closed for new submissions. No new pull requests for adding submissions are accepted at this time. Pending PRs will be evaluated over the next few days. Please don't push any changes to pending PRs after today, unless being asked to do so. This will be the case if I spot an issue during evaluation (failing tests, etc.). In this case, I will comment on the PR, and you are allowed to push one update. Only changes strictly needed to fix the bug at hand may be pushed at this point. No force-pushes are allowed, so as to make sure I can see which changes have been made. I will re-evaluate the entry, and if there are still remaining issues, you'll get one more and last opportunity to update the PR. If it still is not valid at this point, it will be closed. The final leader board will be published by Monday Feb 5.
Status Jan 31: The challenge will close today at midnight UTC.
Status Jan 12: As there has been such a large number of entries to this challenge so far (100+), and this is becoming hard to manage, please only create new submissions if you expect them to run in 10 seconds or less on the evaluation machine.
Status Jan 1: This challenge is open for submissions!
The One Billion Row Challenge (1BRC) is a fun exploration of how far modern Java can be pushed for aggregating one billion rows from a text file. Grab all your (virtual) threads, reach out to SIMD, optimize your GC, or pull any other trick, and create the fastest implementation for solving this task!
The text file contains temperature values for a range of weather stations.
Each row is one measurement in the format <string: station name>;<double: measurement>
, with the measurement value having exactly one fractional digit.
The following shows ten rows as an example:
Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
St. John's;15.2
Cracow;12.6
Bridgetown;26.9
Istanbul;6.2
Roseau;34.4
Conakry;31.2
Istanbul;23.0
The task is to write a Java program which reads the file, calculates the min, mean, and max temperature value per weather station, and emits the results on stdout like this
(i.e. sorted alphabetically by station name, and the result values per station in the format <min>/<mean>/<max>
, rounded to one fractional digit):
{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.3, Abéché=-10.0/29.4/69.0, Accra=-10.1/26.4/66.4, Addis Ababa=-23.7/16.0/67.0, Adelaide=-27.8/17.3/58.5, ...}
Submit your implementation by Jan 31 2024 and become part of the leaderboard!
These are the results from running all entries into the challenge on eight cores of a Hetzner AX161 dedicated server (32 core AMD EPYC™ 7502P (Zen2), 128 GB RAM).
# | Result (m:s.ms) | Implementation | JDK | Submitter | Notes |
---|---|---|---|---|---|
1 | 00:01.535 | link | 21.0.2-graal | Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen | GraalVM native binary, uses Unsafe |
2 | 00:01.587 | link | 21.0.2-graal | Artsiom Korzun | GraalVM native binary, uses Unsafe |
3 | 00:01.608 | link | 21.0.2-graal | Jaromir Hamala | GraalVM native binary, uses Unsafe |
00:01.880 | link | 21.0.1-open | Serkan ÖZAL | uses Unsafe | |
00:01.921 | link | 21.0.2-graal | Van Phu DO | GraalVM native binary, uses Unsafe | |
00:02.018 | link | 21.0.2-graal | Stephen Von Worley | GraalVM native binary, uses Unsafe | |
00:02.157 | link | 21.0.2-graal | Roy van Rijn | GraalVM native binary, uses Unsafe | |
00:02.319 | link | 21.0.2-graal | Yavuz Tas | GraalVM native binary, uses Unsafe | |
00:02.332 | link | 21.0.2-graal | Marko Topolnik | GraalVM native binary, uses Unsafe | |
00:02.367 | link | 21.0.1-open | Quan Anh Mai | uses Unsafe | |
00:02.507 | link | 21.0.1-open | gonix | uses Unsafe | |
00:02.557 | link | 21.0.1-open | yourwass | uses Unsafe | |
00:02.820 | link | 22.ea.32-open | Li Lin | uses Unsafe | |
00:02.995 | link | 21.0.2-graal | tivrfoa | GraalVM native binary, uses Unsafe | |
00:02.997 | link | 21.0.1-open | gonix | ||
00:03.095 | link | 21.0.2-graal | Jamal Mulla | GraalVM native binary, uses Unsafe | |
00:03.210 | link | 21.0.1-open | Quan Anh Mai | ||
00:03.298 | link | 21.0.1-graal | Subrahmanyam (non-idiomatic) | uses Unsafe | |
00:03.431 | link | 21.0.1-graal | Roman Musin | GraalVM native binary, uses Unsafe | |
00:03.469 | link | 21.0.2-graal | Elliot Barlas | GraalVM native binary, uses Unsafe | |
00:03.698 | link | 21.0.1-graal | Jason Nochlin | ||
00:03.785 | link | 21.0.2-graal | zerninv | GraalVM native binary, uses Unsafe | |
00:03.820 | link | 21.0.2-graal | John Ziamos | GraalVM native binary, uses Unsafe | |
00:03.902 | link | 21.0.1-open | Juan Parera | ||
00:03.966 | link | 21.0.1-open | Jin Cong Ho | uses Unsafe | |
00:03.991 | link | 21.0.1-graal | Vaidhy Mayilrangam | uses Unsafe | |
00:04.066 | link | 21.0.1-open | JesseVanRooy | uses Unsafe | |
00:04.101 | link | 21.0.2-graal | Jaime Polidura | GraalVM native binary, uses Unsafe | |
00:04.209 | link | 21.0.1-open | Giovanni Cuccu | ||
00:04.474 | link | 21.0.1-open | Roman Stoffel | ||
00:04.676 | link | 21.0.2-tem | Peter Levart | ||
00:04.684 | link | 21.0.1-open | Florin Blanaru | uses Unsafe | |
00:04.701 | link | 21.0.1-open | Dr Ian Preston | ||
00:04.741 | link | 21.0.1-open | Cliff Click | uses Unsafe | |
00:04.800 | link | 21.0.1-open | Parker Timmins | ||
00:04.884 | link | 21.0.1-open | Aleksey Shipilëv | ||
00:04.920 | link | 21.0.1-graal | Subrahmanyam | ||
00:05.077 | link | 21.0.2-graal | Jonathan Wright | GraalVM native binary, uses Unsafe | |
00:05.142 | link | 21.0.1-open | Arjen Wisse | ||
00:05.167 | link | 21.0.2-open | Yevhenii Melnyk | ||
00:05.235 | link | 21.0.1-open | unbounded | ||
00:05.336 | link | java | Sumit Chaudhary | uses Unsafe | |
00:05.354 | link | 21.0.2-graal | Arman Sharif | GraalVM native binary, uses Unsafe | |
00:05.478 | link | 21.0.1-open | Olivier Bourgain | uses Unsafe | |
00:05.559 | link | 21.0.1-graal | Panagiotis Drakatos | GraalVM native binary | |
00:05.887 | link | 21.0.1-graal | Charlie Evans | uses Unsafe | |
00:05.979 | link | 21.0.1-graal | Sam Pullara | ||
00:06.166 | link | 21.0.1-open | Jamie Stansfield | ||
00:06.257 | link | 21.0.1-graal | Stefan Sprenger | uses Unsafe | |
00:06.392 | link | 21.0.2-graal | Diego Parra | ||
00:06.576 | link | 21.0.1-open | Andrew Sun | uses Unsafe | |
00:06.635 | link | 21.0.1-graal | Laake Scates-Gervasi | GraalVM native binary, uses Unsafe | |
00:06.654 | link | 21.0.1-graal | Jaroslav Bachorik | ||
00:06.715 | link | 21.0.1-open | Algirdas Raščius | ||
00:06.884 | link | 21.0.1-graal | rcasteltrione | ||
00:07.563 | link | 21.0.1-graal | 3j5a | ||
00:07.680 | link | 21.0.1-graal | Xylitol | uses Unsafe | |
00:07.712 | link | 21.0.1-graal | Anita SV | ||
00:07.730 | link | 21.0.1-open | Johannes Schüth | ||
00:07.894 | link | 21.0.2-tem | Antonio Muñoz | ||
00:07.925 | link | 21.0.1-graal | Ricardo Pieper | ||
00:08.157 | link | 21.0.1-open | JurenIvan | ||
00:08.167 | link | 21.0.1-tem | Dimitar Dimitrov | ||
00:08.214 | link | 21.0.1-open | deemkeen | ||
00:08.255 | link | 21.0.1-open | Mathias Bjerke | ||
00:08.398 | link | 21.0.1-open | Parth Mudgal | uses Unsafe | |
00:08.489 | link | 21.0.1-graal | Bang NGUYEN | ||
00:08.517 | link | 21.0.1-graal | ags | uses Unsafe | |
00:08.557 | link | 21.0.1-graal | Adrià Cabeza | ||
00:08.622 | link | 21.0.1-graal | Keshavram Kuduwa | uses Unsafe | |
00:08.892 | link | 21.0.1-open | Roman Romanchuk | ||
00:08.896 | link | 21.0.1-open | Andrzej Nestoruk | ||
00:09.020 | link | 21.0.1-open | yemreinci | ||
00:09.071 | link | 21.0.1-open | Gabriel Reid | ||
00:09.352 | link | 21.0.1-graal | Filip Hrisafov | ||
00:09.867 | link | 21.0.1-graal | Ricardo Pieper | ||
00:09.945 | link | 21.0.1-open | Anthony Goubard | ||
00:10.092 | link | 21.0.1-graal | Pratham | ||
00:10.127 | link | 21.0.1-open | Parth Mudgal | uses Unsafe | |
00:11.577 | link | 21.0.1-open | Eve | ||
00:10.473 | link | 21.0.1-open | Anton Rybochkin | ||
00:11.119 | link | 21.0.1-open | lawrey | ||
00:11.156 | link | java | Yann Moisan | ||
00:11.167 | link | 21.0.1-open | Nick Palmer | ||
00:11.352 | link | 21.0.1-open | karthikeyan97 | uses Unsafe | |
00:11.363 | link | 21.0.2-tem | Guruprasad Sridharan | ||
00:11.405 | link | 21.0.1-graal | Rafael Merino García | ||
00:11.406 | link | 21.0.1-graal | gabrielfoo | ||
00:11.433 | link | 21.0.1-graal | Jatin Gala | ||
00:11.505 | link | 21.0.1-open | Dmitry Bufistov | uses Unsafe | |
00:11.744 | link | 21.0.2-tem | Sebastian Lövdahl | ||
00:11.805 | link | 21.0.1-graal | Cool_Mineman | ||
00:11.934 | link | 21.0.1-open | arjenvaneerde | ||
00:12.220 | link | 21.0.1-open | Richard Startin | ||
00:12.495 | link | 21.0.1-graal | Samuel Yvon | GraalVM native binary | |
00:12.568 | link | 21.0.1-graal | Vlad | ||
00:12.800 | link | java | Yonatan Graber | ||
00:13.013 | link | 21.0.1-graal | Thanh Duong | ||
00:13.071 | link | 21.0.1-open | Dr Ian Preston | ||
00:13.729 | link | java | Cedric Boes | ||
00:13.817 | link | 21.0.1-open | Carlo | ||
00:14.502 | link | 21.0.1-graal | eriklumme | ||
00:14.772 | link | 21.0.1-open | Kevin McMurtrie | ||
00:14.867 | link | 21.0.1-open | Michael Berry | ||
00:14.900 | link | java | Judekeyser | ||
00:15.006 | link | java | Paweł Adamski | ||
00:15.662 | link | 21.0.1-open | Serghei Motpan | ||
00:16.063 | link | 21.0.1-open | Marek Kohn | ||
00:16.457 | link | 21.0.1-open | Aleksei | ||
00:16.953 | link | 21.0.1-open | Gaurav Anantrao Deshmukh | ||
00:17.046 | link | 21.0.1-open | Dimitris Karampinas | ||
00:17.086 | link | java | Breejesh Rathod | ||
00:17.490 | link | 21.0.1-open | Gergely Kiss | ||
00:17.255 | link | 21.0.1-open | tkosachev | ||
00:17.520 | link | 21.0.1-open | Farid | ||
00:17.717 | link | 21.0.1-open | Oleh Marchenko | ||
00:17.815 | link | 21.0.1-open | Hallvard Trætteberg | ||
00:17.932 | link | 21.0.1-open | Bartłomiej Pietrzyk | ||
00:18.251 | link | 21.0.1-graal | Markus Ebner | ||
00:18.448 | link | 21.0.1-open | Moysés Borges Furtado | ||
00:18.771 | link | 21.0.1-graal | David Kopec | ||
00:18.902 | link | 21.0.1-graal | Maxime | ||
00:19.357 | link | 21.0.1-graalce | Roman Schweitzer | ||
00:20.691 | link | 21.0.1-graal | Kidlike | GraalVM native binary | |
00:21.989 | link | 21.0.1-open | couragelee | ||
00:22.188 | link | 21.0.1-open | Jairo Graterón | ||
00:22.334 | link | 21.0.1-open | Alberto Venturini | ||
00:22.457 | link | 21.0.1-open | Ramzi Ben Yahya | ||
00:22.471 | link | 21.0.1-open | Shivam Agarwal | ||
00:24.986 | link | 21.0.1-open | kumarsaurav123 | ||
00:25.064 | link | 21.0.2-open | Sudhir Tumati | ||
00:26.500 | link | 21.0.1-open | Bruno Félix | ||
00:28.381 | link | 21.0.1-open | Hampus | ||
00:29.741 | link | 21.0.1-open | Matteo Vaccari | ||
00:32.018 | link | 21.0.1-open | Aurelian Tutuianu | ||
00:34.388 | link | 21.0.1-tem | Tobi | ||
00:35.875 | link | 21.0.1-open | MahmoudFawzyKhalil | ||
00:36.180 | link | 21.0.1-open | Horia Chiorean | ||
00:36.424 | link | java | Manish Garg | ||
00:38.340 | link | 21.0.1-open | AbstractKamen | ||
00:41.982 | link | 21.0.1-open | Chris Riccomini | ||
00:42.893 | link | 21.0.1-open | javamak | ||
00:46.597 | link | 21.0.1-open | Maeda-san | ||
00:58.811 | link | 21.0.1-open | Ujjwal Bharti | ||
01:05.094 | link | 21.0.1-open | Mudit Saxena | ||
01:05.979 | link | 21.0.1-graal | Hieu Dao Quang | ||
01:06.790 | link | 21.0.1-open | Karl Heinz Marbaise | ||
01:06.944 | link | 21.0.1-open | santanu | ||
01:07.014 | link | 21.0.1-open | pedestrianlove | ||
01:07.101 | link | 21.0.1-open | Jeevjyot Singh Chhabda | ||
01:08.811 | link | 21.0.1-open | Aleš Justin | ||
01:08.908 | link | 21.0.1-open | itaske | ||
01:09.595 | link | 21.0.1-tem | Antonio Goncalves | ||
01:09.882 | link | 21.0.1-open | Prabhu R | ||
01:14.815 | link | 21.0.1-open | twohardthings | ||
01:25.801 | link | 21.0.1-open | ivanklaric | ||
01:33.594 | link | 21.0.1-open | Gaurav Mathur | ||
01:53.208 | link | java | Mahadev K | ||
01:56.607 | link | 21.0.1-open | Abhilash | ||
03:43.521 | link | 21.0.1-open | 김예환 Ye-Hwan Kim (Sam) | ||
03:59.760 | link | 21.0.1-open | Samson | ||
--- | |||||
04:49.679 | link (Baseline) | 21.0.1-open | Gunnar Morling |
* These two entries have such a similar runtime (below the error margin I can reliably measure), that they share position #1 in the leaderboar.
Note that I am not super-scientific in the way I'm running the contenders (see Evaluating Results for the details). This is not a high-fidelity micro-benchmark and there can be variations of ~ +-5% between runs. So don't be too hung up on the exact ordering of your entry compared to others in close proximity. The primary purpose of this challenge is to learn something new, have fun along the way, and inspire others to do the same. The leaderboard is only means to an end for achieving this goal. If you observe drastically different results though, please open an issue.
See Entering the Challenge for instructions how to enter the challenge with your own implementation. The Show & Tell features a wide range of 1BRC entries built using other languages, databases, and tools.
This section lists results from running the fastest N entries with different configurations. As entries have been optimized towards the specific conditions of the original challenge description and set-up (such as size of the key set), challenge entries may perform very differently across different configurations. These bonus results are provided here for informational purposes only. For the 1BRC challenge, only the results in the previous section are of importance.
For officially evaluating entries into the challenge, each contender is run on eight cores of the evaluation machine (AMD EPYC™ 7502P). Here are the results from running the top 25 entries (as of commit 1ba9cdcf, Feb 1) on all 32 cores / 64 threads (i.e. SMT is enabled) of the machine:
# | Result (m:s.ms) | Implementation | JDK | Submitter | Notes |
---|---|---|---|---|---|
1* | 00:00.324 | link | 21.0.2-graal | Jaromir Hamala | GraalVM native binary, uses Unsafe |
1* | 00:00.326 | link | 21.0.2-graal | Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen | GraalVM native binary, uses Unsafe |
2* | 00:00.350 | link | 21.0.2-graal | Artsiom Korzun | GraalVM native binary, uses Unsafe |
2* | 00:00.351 | link | 21.0.2-graal | Van Phu DO | GraalVM native binary, uses Unsafe |
3 | 00:00.389 | link | 21.0.2-graal | Stephen Von Worley | GraalVM native binary, uses Unsafe |
00:00.410 | link | 21.0.2-graal | Yavuz Tas | GraalVM native binary, uses Unsafe | |
00:00.410 | link | 21.0.2-graal | Roy van Rijn | GraalVM native binary, uses Unsafe | |
00:00.502 | link | 21.0.2-graal | Marko Topolnik | GraalVM native binary, uses Unsafe | |
00:00.609 | link | 21.0.1-graal | Roman Musin | GraalVM native binary, uses Unsafe | |
00:00.611 | link | 21.0.1-open | gonixunsafe | uses Unsafe | |
00:00.716 | link | 21.0.2-graal | Jamal Mulla | GraalVM native binary, uses Unsafe | |
00:00.728 | link | 21.0.2-graal | tivrfoa | GraalVM native binary, uses Unsafe | |
00:00.764 | link | 21.0.1-open | Serkan ÖZAL | uses Unsafe | |
00:00.785 | link | 21.0.2-graal | Elliot Barlas | GraalVM native binary, uses Unsafe | |
00:00.814 | link | 21.0.1-open | gonix | ||
00:00.838 | link | 21.0.2-graal | zerninv | GraalVM native binary, uses Unsafe | |
00:00.877 | link | 21.0.2-graal | John Ziamos | GraalVM native binary, uses Unsafe | |
00:01.179 | link | 21.0.1-graal | vemanaNonIdiomatic | uses Unsafe | |
00:01.268 | link | 21.0.1-open | merykittyunsafe | uses Unsafe | |
00:01.289 | link | 22.ea.32-open | Li Lin | uses Unsafe | |
00:01.345 | link | 21.0.1-graal | Jason Nochlin | ||
00:01.393 | link | 21.0.1-open | Quan Anh Mai | ||
00:01.478 | link | 21.0.1-open | yourwass | uses Unsafe | |
00:01.770 | link | 21.0.1-open | Jin Cong Ho | uses Unsafe | |
00:02.918 | link | 21.0.1-open | Juan Parera |
The 1BRC challenge data set contains 413 distinct weather stations, whereas the rules allow for 10,000 different station names to occur. Here are the results from running the top 25 entries (as of commit 1ba9cdcf, Feb 1) against 1,000,000,000 measurement values across 10K stations (created via ./create_measurements3.sh 1000000000), using eight cores on the evaluation machine:
# | Result (m:s.ms) | Implementation | JDK | Submitter | Notes |
---|---|---|---|---|---|
1 | 00:02.977 | link | 21.0.2-graal | Artsiom Korzun | GraalVM native binary, uses Unsafe |
2 | 00:03.068 | link | 21.0.2-graal | Marko Topolnik | GraalVM native binary, uses Unsafe |
3 | 00:03.175 | link | 21.0.2-graal | Stephen Von Worley | GraalVM native binary, uses Unsafe |
00:04.022 | link | 21.0.2-graal | Roy van Rijn | GraalVM native binary, uses Unsafe | |
00:04.047 | link | 21.0.2-graal | Jaromir Hamala | GraalVM native binary, uses Unsafe | |
00:04.122 | link | 21.0.1-open | gonixunsafe | uses Unsafe | |
00:04.520 | link | 21.0.2-graal | tivrfoa | GraalVM native binary, uses Unsafe | |
00:04.655 | link | 21.0.2-graal | Jamal Mulla | GraalVM native binary, uses Unsafe | |
00:04.708 | link | 21.0.1-open | gonix | ||
00:04.797 | link | 21.0.2-graal | Thomas Wuerthinger, Quan Anh Mai, Alfonso² Peterssen | GraalVM native binary, uses Unsafe | |
00:04.814 | link | 21.0.1-graal | vemanaNonIdiomatic | uses Unsafe | |
00:05.248 | link | 21.0.2-graal | zerninv | GraalVM native binary, uses Unsafe | |
00:05.367 | link | 21.0.2-graal | Yavuz Tas | GraalVM native binary, uses Unsafe | |
00:05.894 | link | 21.0.2-graal | Elliot Barlas | GraalVM native binary, uses Unsafe | |
00:06.014 | link | 21.0.2-graal | Van Phu DO | GraalVM native binary, uses Unsafe | |
00:06.380 | link | 21.0.2-graal | John Ziamos | GraalVM native binary, uses Unsafe | |
00:08.830 | link | 21.0.1-open | Serkan ÖZAL | uses Unsafe | |
00:09.349 | link | 21.0.1-open | yourwass | uses Unsafe | |
00:10.388 | link | 21.0.1-open | merykittyunsafe | uses Unsafe | |
00:12.467 | link | 21.0.1-open | Juan Parera | ||
00:13.225 | link | 21.0.1-open | Quan Anh Mai | ||
00:15.901 | link | 21.0.1-open | Jin Cong Ho | uses Unsafe | |
00:17.972 | link | 21.0.1-graal | Jason Nochlin | ||
00:20.174 | link | 21.0.1-graal | Roman Musin | GraalVM native binary, uses Unsafe | |
00:21.591 | link | 22.ea.32-open | Li Lin | uses Unsafe |
Java 21 must be installed on your system.
This repository contains two programs:
dev.morling.onebrc.CreateMeasurements
(invoked via create_measurements.sh): Creates the file measurements.txt in the root directory of this project with a configurable number of random measurement valuesdev.morling.onebrc.CalculateAverage
(invoked via calculate_average_baseline.sh): Calculates the average values for the file measurements.txt
Execute the following steps to run the challenge:
-
Build the project using Apache Maven:
./mvnw clean verify
-
Create the measurements file with 1B rows (just once):
./create_measurements.sh 1000000000
This will take a few minutes. Attention: the generated file has a size of approx. 12 GB, so make sure to have enough diskspace.
If you're running the challenge with a non-Java language, there's a non-authoritative Python script to generate the measurements file at
src/main/python/create_measurements.py
. The authoritative method for generating the measurements is the Java programdev.morling.onebrc.CreateMeasurements
. -
Calculate the average measurement values:
./calculate_average_baseline.sh
The provided naive example implementation uses the Java streams API for processing the file and completes the task in ~2 min on environment used for result evaluation. It serves as the base line for comparing your own implementation.
-
Optimize the heck out of it:
Adjust the
CalculateAverage
program to speed it up, in any way you see fit (just sticking to a few rules described below). Options include parallelizing the computation, using the (incubating) Vector API, memory-mapping different sections of the file concurrently, using AppCDS, GraalVM, CRaC, etc. for speeding up the application start-up, choosing and tuning the garbage collector, and much more.
A tip is that if you have jbang installed, you can get a flamegraph of your program by running async-profiler via ap-loader:
jbang --javaagent=ap-loader@jvm-profiling-tools/ap-loader=start,event=cpu,file=profile.html -m dev.morling.onebrc.CalculateAverage_yourname target/average-1.0.0-SNAPSHOT.jar
or directly on the .java file:
jbang --javaagent=ap-loader@jvm-profiling-tools/ap-loader=start,event=cpu,file=profile.html src/main/java/dev/morling/onebrc/CalculateAverage_yourname
When you run this, it will generate a flamegraph in profile.html. You can then open this in a browser and see where your program is spending its time.
- Any of these Java distributions may be used:
- Any builds provided by SDKMan
- Early access builds available on openjdk.net may be used (including EA builds for OpenJDK projects like Valhalla)
- Builds on builds.shipilev.net If you want to use a build not available via these channels, reach out to discuss whether it can be considered.
- No external library dependencies may be used
- Implementations must be provided as a single source file
- The computation must happen at application runtime, i.e. you cannot process the measurements file at build time (for instance, when using GraalVM) and just bake the result into the binary
- Input value ranges are as follows:
- Station name: non null UTF-8 string of min length 1 character and max length 100 bytes, containing neither
;
nor\n
characters. (i.e. this could be 100 one-byte characters, or 50 two-byte characters, etc.) - Temperature value: non null double between -99.9 (inclusive) and 99.9 (inclusive), always with one fractional digit
- Station name: non null UTF-8 string of min length 1 character and max length 100 bytes, containing neither
- There is a maximum of 10,000 unique station names
- Line endings in the file are
\n
characters on all platforms - Implementations must not rely on specifics of a given data set, e.g. any valid station name as per the constraints above and any data distribution (number of measurements per station) must be supported
- The rounding of output values must be done using the semantics of IEEE 754 rounding-direction "roundTowardPositive"
To submit your own implementation to 1BRC, follow these steps:
- Create a fork of the onebrc GitHub repository.
- Run
./create_fork.sh <your_GH_user>
to copy the baseline implementation to your personal files, or do this manually:- Create a copy of CalculateAverage_baseline.java, named CalculateAverage_<your_GH_user>.java, e.g. CalculateAverage_doloreswilson.java.
- Create a copy of calculate_average_baseline.sh, named calculate_average_<your_GH_user>.sh, e.g. calculate_average_doloreswilson.sh.
- Adjust that script so that it references your implementation class name. If needed, provide any JVM arguments via the
JAVA_OPTS
variable in that script. Make sure that script does not write anything to standard output other than calculation results. - (Optional) OpenJDK 21 is used by default. If a custom JDK build is required, create a copy of prepare_baseline.sh, named prepare_<your_GH_user>.sh, e.g. prepare_doloreswilson.sh. Include the SDKMAN command
sdk use java [version]
in the your prepare script. - (Optional) If you'd like to use native binaries (GraalVM), add all the required build logic to your prepare_<your_GH_user>.sh script.
- Make that implementation fast. Really fast.
- Run the test suite by executing /test.sh <your_GH_user>; if any differences are reported, fix them before submitting your implementation.
- Create a pull request against the upstream repository, clearly stating
- The name of your implementation class.
- The execution time of the program on your system and specs of the same (CPU, number of cores, RAM). This is for informative purposes only, the official runtime will be determined as described below.
- I will run the program and determine its performance as described in the next section, and enter the result to the scoreboard.
Note: I reserve the right to not evaluate specific submissions if I feel doubtful about the implementation (I.e. I won't run your Bitcoin miner ;).
If you'd like to discuss any potential ideas for implementing 1BRC with the community, you can use the GitHub Discussions of this repository. Please keep it friendly and civil.
The challenge runs until Jan 31 2024. Any submissions (i.e. pull requests) created after Jan 31 2024 23:59 UTC will not be considered.
Results are determined by running the program on a Hetzner AX161 dedicated server (32 core AMD EPYC™ 7502P (Zen2), 128 GB RAM).
Programs are run from a RAM disk (i.o. the IO overhead for loading the file from disk is not relevant), using 8 cores of the machine.
Each contender must pass the 1BRC test suite (/test.sh).
The hyperfine
program is used for measuring execution times of the launch scripts of all entries, i.e. end-to-end times are measured.
Each contender is run five times in a row.
The slowest and the fastest runs are discarded.
The mean value of the remaining three runs is the result for that contender and will be added to the results table above.
The exact same measurements.txt file is used for evaluating all contenders.
See the script evaluate.sh for the exact implementation of the evaluation steps.
If you enter this challenge, you may learn something new, get to inspire others, and take pride in seeing your name listed in the scoreboard above. Rumor has it that the winner may receive a unique 1️⃣🐝🏎️ t-shirt, too!
Q: Can I use Kotlin or other JVM languages other than Java?
A: No, this challenge is focussed on Java only. Feel free to inofficially share implementations significantly outperforming any listed results, though.
Q: Can I use non-JVM languages and/or tools?
A: No, this challenge is focussed on Java only. Feel free to inofficially share interesting implementations and results though. For instance it would be interesting to see how DuckDB fares with this task.
Q: I've got an implementation—but it's not in Java. Can I share it somewhere?
A: Whilst non-Java solutions cannot be formally submitted to the challenge, you are welcome to share them over in the Show and tell GitHub discussion area.
Q: Can I use JNI?
A: Submissions must be completely implemented in Java, i.e. you cannot write JNI glue code in C/C++. You could use AOT compilation of Java code via GraalVM though, either by AOT-compiling the entire application, or by creating a native library (see here.
Q: What is the encoding of the measurements.txt file?
A: The file is encoded with UTF-8.
Q: Can I make assumptions on the names of the weather stations showing up in the data set?
A: No, while only a fixed set of station names is used by the data set generator, any solution should work with arbitrary UTF-8 station names
(for the sake of simplicity, names are guaranteed to contain no ;
or \n
characters).
Q: Can I copy code from other submissions?
A: Yes, you can. The primary focus of the challenge is about learning something new, rather than "winning". When you do so, please give credit to the relevant source submissions. Please don't re-submit other entries with no or only trivial improvements.
Q: Which operating system is used for evaluation?
A: Fedora 39.
Q: My solution runs in 2 sec on my machine. Am I the fastest 1BRC-er in the world?
A: Probably not :) 1BRC results are reported in wallclock time, thus results of different implementations are only comparable when obtained on the same machine. If for instance an implementation is faster on a 32 core workstation than on the 8 core evaluation instance, this doesn't allow for any conclusions. When sharing 1BRC results, you should also always share the result of running the baseline implementation on the same hardware.
Q: Why 1️⃣🐝🏎️ ?
A: It's the abbreviation of the project name: One Billion Row Challenge.
A list of external resources such as blog posts and videos, discussing 1BRC and specific implementations:
- The One Billion Row Challenge Shows That Java Can Process a One Billion Rows File in Two Seconds , by Olimpiu Pop (interview)
- Cliff Click discussing his 1BRC solution on the Coffee Compiler Club (video)
- 1️⃣🐝🏎️🦆 (1BRC in SQL with DuckDB), by Robin Moffatt (blog post)
- 1 billion rows challenge in PostgreSQL and ClickHouse, by Francesco Tisiot (blog post)
- The One Billion Row Challenge with Snowflake, by Sean Falconer (blog post)
- One billion row challenge using base R, by David Schoch (blog post)
- 1 Billion Row Challenge with Apache Pinot, by Hubert Dulay (blog post)
- One Billion Row Challenge In C, by Danny Van Kooten (blog post)
- One Billion Row Challenge in Racket, by Bogdan Popa (blog post)
- The One Billion Row Challenge - .NET Edition, by Frank A. Krueger (podcast)
- One Billion Row Challenge, by Ragnar Groot Koerkamp (blog post)
- ClickHouse and The One Billion Row Challenge, by Dale McDiarmid (blog post)
- One Billion Row Challenge & Azure Data Explorer, by Niels Berglund (blog post)
- One Billion Row Challenge - view from sidelines, by Leo Chashnikov (blog post)
A big thank you to my employer Decodable for funding the evaluation environment and supporting this challenge!
This code base is available under the Apache License, version 2.
Be excellent to each other! More than winning, the purpose of this challenge is to have fun and learn something new.