diff --git a/docs/02-hpc-tutorials/04-cpp-introduction/02-cpp-build-manually.md b/docs/02-hpc-tutorials/04-cpp-introduction/02-cpp-build-manually.md index a62860be..1e90435d 100644 --- a/docs/02-hpc-tutorials/04-cpp-introduction/02-cpp-build-manually.md +++ b/docs/02-hpc-tutorials/04-cpp-introduction/02-cpp-build-manually.md @@ -184,7 +184,7 @@ this file there are two options. g++ src/database_lib.cc -I${PWD}/include -fpic -c -o build/database_lib.o ``` --I${PWD}/include will ensure the compiler searches the include directory +``-I${PWD}/include`` will ensure the compiler searches the include directory for headers ### Fix 2: Environment Variables @@ -234,7 +234,7 @@ g++ src/grocery_db.cc -I${PWD}/include -L${PWD}/build -ldatabase_lib -o build/gr g++ src/movies_db.cc -I${PWD}/include -L${PWD}/build -ldatabase_lib -o build/movies_db ``` --L${PWD}/build tells the compiler to search this directory for shared objects. +``-L${PWD}/build`` tells the compiler to search this directory for shared objects. ### Fix 2: Environment Variables diff --git a/docs/02-hpc-tutorials/04-cpp-introduction/03-cpp-build-with-cmake.md b/docs/02-hpc-tutorials/04-cpp-introduction/03-cpp-build-with-cmake.md index aea22607..482ab896 100644 --- a/docs/02-hpc-tutorials/04-cpp-introduction/03-cpp-build-with-cmake.md +++ b/docs/02-hpc-tutorials/04-cpp-introduction/03-cpp-build-with-cmake.md @@ -187,7 +187,7 @@ set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY CMAKE_BINARY_DIR is automatically provided by CMake. This is the absolute path to the directory which contains the root CMake. In our case, this would -be "cd ${GRC_TUTORIAL}/cpp/03-cpp-build-with-cmake". +be ``cd ${GRC_TUTORIAL}/cpp/03-cpp-build-with-cmake``. In this example, we output all executables and shared objects to the bin directory. @@ -259,14 +259,14 @@ include_directories(${CMAKE_SOURCE_DIR}/include) ``` *include_directories* will ensure that header files can be discovered -by the C++ compiler. This is analagous to the "-I" flag in the gcc +by the C++ compiler. This is analagous to the ``-I`` flag in the gcc compiler. Here we ensure that the compiler will search the directory -${CMAKE_SOURCE_DIR}/include for header files. +``${CMAKE_SOURCE_DIR}/include`` for header files. CMAKE_SOURCE_DIR is provided automatically by CMake. It represents the absolute path to the directory containing the root CMakeLists.txt. In our case, this constant would expand to -"cd ${GRC_TUTORIAL}/cpp/03-cpp-build-with-cmake/". +``cd ${GRC_TUTORIAL}/cpp/03-cpp-build-with-cmake/``. ### Creating a Shared Library ```cmake @@ -283,12 +283,12 @@ target_link_libraries(database_lib input the path to all source files related to the build. The SHARED indicates this library is shared (as opposed to static). Here, there is only one source file, datbase_lib.cc. The output of this command will be "libdatabase_lib.so" in -the "build/lib" directory. +the ``build/lib`` directory. CMAKE_CURRENT_SOURCE_DIR is provided automatically by CMake. It represents the absolute path to the directory containing the CMakeLists.txt currently being processed. In our case, this constant would expand to -"cd ${GRC_TUTORIAL}/cpp/03-cpp-build-with-cmake/src". +``cd ${GRC_TUTORIAL}/cpp/03-cpp-build-with-cmake/src``. *target_link_libraries* will link all necessary libraries necessary to compile the target database_lib. This is analagous to the "-l" flag in gcc. In our case, we link against the @@ -339,14 +339,14 @@ install( *install* defines what happens when a user calls "make install". In this case we specify that our targets database_lib, grocery_db, and movies_db -should be installed into one of LIBRARY, ARCHIVE, or RUNTIME depending +should be installed into one of ``LIBRARY``, ``ARCHIVE``, or ``RUNTIME`` depending on its type. For example, database_lib will be installed to LIBRARY (since we used add_library), whereas grocery_db and movies_db will be installed -to RUNTIME (since we used add_executable). +to ``RUNTIME`` (since we used add_executable). -CMAKE_INSTALL_PREFIX is a constant provided by CMake which represents +``CMAKE_INSTALL_PREFIX`` is a constant provided by CMake which represents where files should be installed. This can be configured by users by passing --DCMAKE_INSTALL_PREFIX to their CMake build. By default, the value of this +``-DCMAKE_INSTALL_PREFIX`` to their CMake build. By default, the value of this constant is /usr. ### Installing Header Files @@ -364,8 +364,8 @@ install( ``` In this case, we use *install* to specify -that the specific file ${CMAKE_SOURCE_DIR}/include/database_lib.h should be -installed to ${CMAKE_INSTALL_PREFIX}/include. Here, we use the keyword +that the specific file ``${CMAKE_SOURCE_DIR}/include/database_lib.h`` should be +installed to ``${CMAKE_INSTALL_PREFIX}/include``. Here, we use the keyword FILES instead of the keyword TARGET. Targets are defined using a CMake function such as add_executable or add_library. Files are just the way they are with no modification. @@ -386,7 +386,7 @@ set_property(TEST test_movies_db PROPERTY ENVIRONMENT ``` *add_test* creates a CTest case. Here we create two tests: test_grocery_db -and test_movies_db. The test will execute the command ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/grocery_db. +and test_movies_db. The test will execute the command ``${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/grocery_db``. CMAKE_RUNTIME_OUTPUT_DIRECTORY is a constant provided by CMake. It is the location where an executable is installed after performing the @@ -394,10 +394,10 @@ is the location where an executable is installed after performing the *set_property* sets some sort of property about a target. In this case the target is the test case test_movies_db. We are setting -an environment variable LD_LIBRARY_PATH. From section 3.2, we +an environment variable ``LD_LIBRARY_PATH``. From section 3.2, we saw that we needed to be very careful about ensuring the OS knows where shared libraries are located. In this case, we -ensure the OS will check the path ${CMAKE_LIBRARY_OUTPUT_DIRECTORY}. +ensure the OS will check the path ``${CMAKE_LIBRARY_OUTPUT_DIRECTORY}``. CMAKE_LIBRARY_OUTPUT_DIRECTORY is a constant provided by CMake. It is the location where a shared library is installed after performing @@ -456,11 +456,12 @@ This will run unit tests verbosely, meaning that terminal outputs will not be hidden. -VV indicates making the tests verbose You should see something like: -
UpdateCTestConfiguration  from :/home/lukemartinlogan/Documents/Projects/PhD/scs-tutorial/3.3.building_cpp_cmake/build/DartConfiguration.tcl
-Parse Config file:/home/lukemartinlogan/Documents/Projects/PhD/scs-tutorial/3.3.building_cpp_cmake/build/DartConfiguration.tcl
-UpdateCTestConfiguration  from :/home/lukemartinlogan/Documents/Projects/PhD/scs-tutorial/3.3.building_cpp_cmake/build/DartConfiguration.tcl
-Parse Config file:/home/lukemartinlogan/Documents/Projects/PhD/scs-tutorial/3.3.building_cpp_cmake/build/DartConfiguration.tcl
-Test project /home/lukemartinlogan/Documents/Projects/PhD/scs-tutorial/3.3.building_cpp_cmake/build
+```bash
+UpdateCTestConfiguration  from :/home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/DartConfiguration.tcl
+Parse Config file:/home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/DartConfiguration.tcl
+UpdateCTestConfiguration  from :/home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/DartConfiguration.tcl
+Parse Config file:/home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/DartConfiguration.tcl
+Test project /home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build
 Constructing a list of tests
 Done constructing a list of tests
 Updating test list for fixtures
@@ -470,9 +471,10 @@ Checking test dependency graph end
 test 1
     Start 1: test_grocery_db
 
-1: Test command: /home/lukemartinlogan/Documents/Projects/PhD/scs-tutorial/3.3.building_cpp_cmake/build/bin/grocery_db
-1: Environment variables:
-1:  LD_LIBRARY_PATH=/home/lukemartinlogan/Documents/Projects/PhD/scs-tutorial/3.3.building_cpp_cmake/build/bin
+1: Test command: /home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/bin/grocery_db
+1: Working Directory: /home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/test
+1: Environment variables: 
+1:  LD_LIBRARY_PATH=/home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/bin
 1: Test timeout computed to be: 1500
 1: grocery: in create
 1: grocery: in read
@@ -482,9 +484,10 @@ test 1
 test 2
     Start 2: test_movies_db
 
-2: Test command: /home/lukemartinlogan/Documents/Projects/PhD/scs-tutorial/3.3.building_cpp_cmake/build/bin/movies_db
-2: Environment variables:
-2:  LD_LIBRARY_PATH=/home/lukemartinlogan/Documents/Projects/PhD/scs-tutorial/3.3.building_cpp_cmake/build/bin
+2: Test command: /home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/bin/movies_db
+2: Working Directory: /home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/test
+2: Environment variables: 
+2:  LD_LIBRARY_PATH=/home/luke/Documents/Projects/grc-tutorial/cpp/03-cpp-build-with-cmake/build/bin
 2: Test timeout computed to be: 1500
 2: movies: in create
 2: movies: in read
@@ -492,10 +495,10 @@ test 2
 2: movies: in delete
 2/2 Test #2: test_movies_db ...................   Passed    0.00 sec
 
-100% tests passed, 0 tests failed out of 2
+100% tests passed, 0 tests failed out of 2
 
 Total Test time (real) =   0.01 sec
-
+``` #### Installing diff --git a/docs/02-hpc-tutorials/04-cpp-introduction/04-cpp-basic-syntax.md b/docs/02-hpc-tutorials/04-cpp-introduction/04-cpp-basic-syntax.md index c8ee4237..26694f87 100644 --- a/docs/02-hpc-tutorials/04-cpp-introduction/04-cpp-basic-syntax.md +++ b/docs/02-hpc-tutorials/04-cpp-introduction/04-cpp-basic-syntax.md @@ -261,15 +261,15 @@ in 3.06. |Name|Description| |-------|----------| -|A < B|Less than operator. A is less than B.| -|A <= B|Less than or equal operator. A is at most B.| -|A > B|Greater than operator. A is larger than B.| -|A >= B|Greater than or equal operator. A is at least B.| -|A == B|Equality operator. A and B are the same| -|A != B|Inequality operator. A and B are not the same.| -|A && B|AND operator. Both A and B are true.| -|A \|\| B|OR operator. One of A or B is true.| -|!A|NOT operator. Check if A is not true.| +|``A < B``|Less than operator. A is less than B.| +|``A <= B``|Less than or equal operator. A is at most B.| +|``A > B``|Greater than operator. A is larger than B.| +|``A >= B``|Greater than or equal operator. A is at least B.| +|``A == B``|Equality operator. A and B are the same| +|``A !\= B``|Inequality operator. A and B are not the same.| +|``A && B``|AND operator. Both A and B are true.| +|``A \\ B``|OR operator. One of A or B is true.| +|``!A``|NOT operator. Check if A is not true.| ### If-Else @@ -514,7 +514,8 @@ which demonstrates the following: This is technically the way C++ recommends to do File I/O in general. In HPC, it doesn't get used very often, though. Most HPC programs use -STDIO or POSIX. However, we introduce here anyway. +STDIO or POSIX. However, we introduce here anyway. It is located in +[libstdc.cc](https://github.com/grc-iit/grc-tutorial/blob/main/cpp/04-cpp-basic-syntax/src/libstd.cc). ```cpp #include @@ -561,7 +562,7 @@ int main() { To compile & run the code: ```bash -cd ${GRC_TUTORIAL}/3.5.basics +cd ${GRC_TUTORIAL}/cpp/04-cpp-basic-syntax mkdir build cd build make @@ -576,7 +577,7 @@ Hello, World! ### STDIO The following example demonstrates the basics of the STDIO API. -The code is located in [libstd.cc](). +The code is located in [stdio.cc](https://github.com/grc-iit/grc-tutorial/blob/main/cpp/04-cpp-basic-syntax/src/stdio.cc). ```cpp #include @@ -641,7 +642,7 @@ int main() { To compile & run the code: ```bash -cd ${GRC_TUTORIAL}/3.5.basics +cd ${GRC_TUTORIAL}/cpp/04-cpp-basic-syntax mkdir build cd build make @@ -656,6 +657,7 @@ Hello, World! ### POSIX The following example demonstrates the basics of the POSIX API. +It is located in [posix.cc](https://github.com/grc-iit/grc-tutorial/blob/main/cpp/04-cpp-basic-syntax/src/posix.cc). ```cpp #include @@ -720,7 +722,7 @@ int main() { To compile & run the code: ```bash -cd ${GRC_TUTORIAL}/3.5.basics +cd ${GRC_TUTORIAL}/cpp/04-cpp-basic-syntax mkdir build cd build make @@ -811,7 +813,7 @@ int main() { To compile & run the code: ```bash -cd ${GRC_TUTORIAL}/3.5.basics +cd ${GRC_TUTORIAL}/cpp/04-cpp-basic-syntax mkdir build cd build make @@ -924,7 +926,7 @@ ends when both of these statements are no longer true. To get the dataset, run the following: ```bash -cd ${GRC_TUTORIAL}/3.5.basics +cd ${GRC_TUTORIAL}/cpp/04-cpp-basic-syntax mkdir build cd build make @@ -957,7 +959,7 @@ Average CO: 280 ``` Your Objectives: -1. Create a file called my_analyze_kitchen_fire.cc in the ${GRC_TUTORIAL}/3.5.basics directory +1. Create a file called my_analyze_kitchen_fire.cc in the ``${GRC_TUTORIAL}/cpp/04-cpp-basic-syntax`` directory 2. Edit the CMakeLists.txt in that directory to compile your code. Feel free to look at how the other sources in that directory were compiled. 3. How do you read "kitchen_fire.bin" and interpret its contents? 4. How do you analyze its contents to determine the start, end, and average diff --git a/docs/02-hpc-tutorials/04-cpp-introduction/05-cpp-style-and-doc.md b/docs/02-hpc-tutorials/04-cpp-introduction/05-cpp-style-and-doc.md index 2c0b02ba..29391900 100644 --- a/docs/02-hpc-tutorials/04-cpp-introduction/05-cpp-style-and-doc.md +++ b/docs/02-hpc-tutorials/04-cpp-introduction/05-cpp-style-and-doc.md @@ -153,7 +153,6 @@ It can be installed as follows: python3 -m pip install cpplint ``` -We have the following code located [here](). ```cpp // Copyright [year] diff --git a/docs/02-hpc-tutorials/04-cpp-introduction/06-cpp-classes.md b/docs/02-hpc-tutorials/04-cpp-introduction/06-cpp-classes.md index 0bbbb33e..cea0285c 100644 --- a/docs/02-hpc-tutorials/04-cpp-introduction/06-cpp-classes.md +++ b/docs/02-hpc-tutorials/04-cpp-introduction/06-cpp-classes.md @@ -133,7 +133,7 @@ public: In this example, we overload the addition operator to perform complex number addition. ### 3.6.04.3. Relational Operators -Relational operators (==, !=, <, >, <=, >=) can be overloaded to define custom comparison logic for objects of your class. +Relational operators (``==``, ``!=``, ``<``, ``>``, ``<=``, ``>=``) can be overloaded to define custom comparison logic for objects of your class. Example: @@ -190,7 +190,7 @@ public: In this example, we overload the function call operator to create an object that behaves like a function, adding two integers. ### Bitwise Operators -Bitwise operators (&, |, ^, ~, <<, >>) can be overloaded to define custom bitwise operations for objects of your class. +Bitwise operators (`&`, `|`, `^`, `~`, `<<`, `>>`) can be overloaded to define custom bitwise operations for objects of your class. Example: diff --git a/docs/02-hpc-tutorials/05-docker/01-docker-basics.md b/docs/02-hpc-tutorials/05-docker/01-docker-basics.md index 193c2db3..c2ac4058 100644 --- a/docs/02-hpc-tutorials/05-docker/01-docker-basics.md +++ b/docs/02-hpc-tutorials/05-docker/01-docker-basics.md @@ -73,7 +73,7 @@ sudo docker build -t [IMAGE_NAME] [DOCKERFILE_DIR, can be a github link] -f [DOC 2. DOCKERFILE_DIR: the directory containing the Dockerfile. 3. DOCKERFILE_NAME: the name of the dockerfile in that directory. This is optional. Default: Dockerfile. -Let's say that our Dockerfile is located at ${HOME}/MyDockerfiles/Dockerfile. +Let's say that our Dockerfile is located at ``${HOME}/MyDockerfiles/Dockerfile``. We could build the image two ways: ``` # Option 1: a single command diff --git a/docs/02-hpc-tutorials/05-docker/02-docker-cluster.md b/docs/02-hpc-tutorials/05-docker/02-docker-cluster.md index d739eb30..8945aa9e 100644 --- a/docs/02-hpc-tutorials/05-docker/02-docker-cluster.md +++ b/docs/02-hpc-tutorials/05-docker/02-docker-cluster.md @@ -25,10 +25,10 @@ is a subdirectory of the current working directory. ```bash ssh-keygen -t rsa -f ${PWD}/id_rsa -N "" -q ``` -**-t rsa** uses RSA for the algorithm. -**-f ${PWD}/id_rsa** defines the output for the private key to be in this directory. -**-N ""** indicates no password should be generated. -**-q** disables interactive prompts. +* ``-t rsa`` uses RSA for the algorithm. +* ``-f ${PWD}/id_rsa`` defines the output for the private key to be in this directory. +* ``-N ""`` indicates no password should be generated. +* ``-q`` disables interactive prompts. ## OpenSSH-Server Dockerfile @@ -190,16 +190,13 @@ sudo docker-compose exec -u sshuser node2 hostname ``` These commands should print "node1" and "node2". -![docker-compose exec hostname results](images/5/5.2.7.docker-exec-hostname.png) Next, we will try performing ssh from one node into the other. ```bash sudo docker-compose exec -u sshuser node1 ssh node2 hostname ``` -The above command will execute "ssh node2 hostname" in node1. Its -result should be: -![docker-compose exec ssh results](images/5/5.2.7.ssh-test.png) +The above command will execute "ssh node2 hostname" in node1. ## Interactive shell with cluster nodes diff --git a/docs/03-hermes/04-performance-analysis.md.bak b/docs/03-hermes/04-performance-analysis.md.bak deleted file mode 100644 index 4621a8f1..00000000 --- a/docs/03-hermes/04-performance-analysis.md.bak +++ /dev/null @@ -1,308 +0,0 @@ -The following experiments were conducted on the Ares cluster at IIT. -The scripts for these experiments are located -[here](https://github.com/lukemartinlogan/hermes_scripts/blob/master/). - -# 2.00. Experimental Setup -Ares has 32x compute nodes. Each compute node has two -2.2GHz Xeon Scalable Silver 4114 CPUs, totaling 48 cores per node. Each node -is additionally equipped with 40GB of DDR4-2600MHz RAM, a 250GB NVMe, -and a 480GB SATA SSD. - -We use the following Hermes configurations in the below experiments: -1. SSD: The baseline configuration contains only a SATA SSD, which has the least -performance. Up to 400GB capacity. -2. +NVMe: Add an NVMe of 150GB. -3. +RAM: Add main memory of 20GB. - -# 2.01. Device Benchmark - -In this experiment, we benchmark each storage device on compute nodes -individually using IOR and a custom sequential memory benchmark tool. -We vary the number of threads and dataset sizes. The objective of this -experiment is to understand the characteristics of the testbed's hardware. -This evaluation was conducted only over a single node. - -## 2.1.1. SSD - -### Strong scaling -First we perform a strong scaling study. -1. Total dataset size fixed at 100GB -2. Processes vary between 1 and 48 -3. Transfer size of 1MB - -| ![run workloads](../images/performance/ssd-scale.svg) | -|:--:| -|SSD strong scaling| - -Overall, we find that the SSD's performance doesn't change much with -increase in number of processes. The performance difference between -1 process and 48 processes is roughly 15% for Get, and negligible for Put. - -### Dataset Size -Next we measure the impact of dataset size on performance. This is because -we want to measure the impact of garbage collection and OS caching. -1. Total dataset size varies from 10GB to 100GB -2. Processes fixed at 4 -3. Transfer size of 1MB - -| ![run workloads](../images/performance/ssd-dset.svg) | -|:--:| -|SSD dataset size scaling| - -Here we see that the total dataset size does have impacts on performance, -and it's not all due to caching. For datasets of size 10GB, performance -of the SSD is roughly 1.3GBps for Get and 650MBps for Put. After 40GB -dataset size, the performance stagnates at 510MBps for both Put and Get. - -## 2.1.2. NVMe - -### Strong scaling -First we perform a strong scaling study. -1. Total dataset size fixed at 100GB -2. Processes vary between 1 and 48 -3. Transfer size of 1MB - -| ![run workloads](../images/performance/nvme-scale.svg) | -|:--:| -|NVMe strong scaling| - -Overall, we find that the NVMe's performance doesn't change much with -increase in number of processes. Unlike SATA SSD, the NVMe had no -difference as number of processes increased. - -### Dataset Size -Next we measure the impact of dataset size on performance. This is because -we want to measure the impact of garbage collection and OS caching. -1. Total dataset size varies from 10GB to 100GB -2. Processes fixed at 4 -3. Transfer size of 1MB - -| ![run workloads](../images/performance/nvme-dset.svg) | -|:--:| -|NVMe dataset size scaling| - -The dataset size had a dramatic change in performance with dataset size. -This also is not (entirely) due to caching. For 10GB dataset sizes, -the NVMe reaches nearly 2GBps for both Put and Get. For datasets larger -than 20GB, NVMe performance stagnates at roughly 390MBPs for Put and -1.1GBps for Get. - -### 2.1.3. RAM - -TODO. Place results here. - -# 2.02. Multi-Core I/O Scaling - -In this experiment, we measure the effect of multi-core scaling on the -performance of the NVMe configuration of Hermes. We vary the number of -processes to be between 1 and 48. Each case performs a total of 100GB of -I/O with transfer sizes of 1MB using the Hermes native Put/Get API. -This evaluation was conducted only over a single node. - -| ![multi-core scaling (NVMe)](../images/performance/multicore-nvme-scale.svg) | -|:--:| -|NVMe dataset size scaling| - -For NVMe, the performance of both PUT and GET operations were nearly the -same in performance as the evaluation conducted in 1.1. Hermes adds minimal -overhead when scaling the number of CPU cores. - -# 2.03. Yahoo Cloud Storage Benchmark (YCSB) - -In this experiment, we compare Hermes as a single-node key-value store -against other popular key-value stores. For this, we use the Yahoo -Cloud Storage Benchmark (YCSB). The YCSB comes with 8 workloads. -We focus on the first 4: -1. Workload A (Update-heavy): This workload generates a high number of update -requests compared to reads. It represents applications where data is updated -more frequently than it is read. -2. Workload B (Read-heavy): This workload generates a high number of read -requests compared to updates. It represents applications where data is read more -frequently than it is updated. -3. Workload C (Read-only): This workload generates only read requests. It -represents applications where data is read frequently but is never updated. -4. Workload D (Read-modify-write): This workload generates read, modify, and -write requests in equal proportions. It represents applications where data is -updated based on its current value. - -In addition, each workload contains a "Load" phase which perform insert-only -workloads. Unlike an update, insert replaces the entire record. - -| ![load workloads](../images/performance/ycsb-load.svg) | -|:--:| -|Performance of KVS for the LOAD phase of YCSB| - -In this workload, HermesKVS performs roughly 33% better on average than -all alternative KVS. This workload is particularly good for the Hermes -KVS adapter, since insert operations replace data. In our KVS, a record -(a tuple of values) is stored as a single blob. This translates almost -directly to a single Put operation in the Hermes KVS. This demonstrates -that Hermes can perform comparably to well-established in-memory KVS. - -| ![run workloads](../images/performance/ycsb-run.svg) | -|:--:| -|Performance of KVS for the RUN phase of YCSB| - -In these workloads, HermesKVS performs comparably to RocksDB and significantly -faster than memcached. - -RocksDB performs 15% faster than HermesKVS for the -update-heavy workload. This is because, as opposed to a log-structured merge -tree provided in RocksDB, HermesKVS relies on locking + modifying entries -directly. This is more costly, as HermesKVS must retrieve data and then -modify the data, as opposed to just transferring the updates. - -# 2.04. Multi-Tier I/O performance - -In this evaluation, we run a multi-tiered experiment using Hermes native -API. The workload sequentially PUTs 10GB per-node. We ran this experiment -with 16 nodes and 16 processes per node. The overall dataset size is 160GB. - -| ![run workloads](../images/performance/tiering.svg) | -|:--:| -|Performance of Hermes for varying Tiers| - -Overall, we see that with the addition of each tier, performance improvements -are observed. A 2.5x performance improvement is observed by adding NVMe. -An additional 2.5x performance improvement is observed by adding RAM. In -this case, the dataset fits entirely within a single tier, achieving nearly the -full bandwidth of that tier in this specific case. - -# 2.05. DPE Comparison - -Hermes comes with three Data Placement Engines (DPEs): -1. Round-Robin: Iterates over the set of targets fairly, dispersing blobs -evenly among the targets. -2. Random: Choose a target to buffer a blob at random. -3. Minimize I/O Time: place data in the fastest tier with space remaining. -Asynchronously flush data later. - - -To demonstrate the value of customized buffering, we focus on a high-bandwidth -synthetic workload. Each rank produces a total of 1GB of data, there are 16 -ranks per node, and a total of 4 nodes. The total dataset size produced is -160GB. We use a hierarchical setup with RAM, NVMe, and SATA SSD. - -| ![run workloads](../images/performance/dpe.svg) | -|:--:| -|Performance of Hermes for varying DPEs| - -Overall, we find that the choice of DPE has significant impact on overall -performance. The Minimize I/O Time DPE performs roughly 40% better than -Round Robin and 34% better than Random. This is because Round Robin and -Random disperse I/O requests among each storage device roughly evenly, -whereas Minimize I/O Time places data in the fastest available tiers. - -# 2.06. Data Staging Benefit - -Data staging can be used to load full or partial datasets in to the hiearchy -before the application begins. For applications which iterate over datasets -multiple times, staging can provide great performance benefits. - -To demonstrate the value of data staging, we use ior. The workload is -structured as follows: -1. 4 nodes, 16 processes per node -2. Generate a 256GB dataset using IOR -3. Stage in the 256GB dataset in Hermes -4. Run IOR read workload - -Hermes is configured as follows: -| Key | Parameter | -|:--:|:--:| -| RAM | 16GB (per node) | -| NVMe | 100GB (per node) | -| SSD | 150GB (per node) | - -We compare the performance of using staging to without using staging. The -results are shown in the figure below. - -| ![run workloads](images/performance/data-staging.svg) | -|:--:| -|Performance of Staging| - -In this case, 25% of the dataset is staged in RAM and 75% of the dataset -is staged in NVMe. Without staging, Hermes incurs the cost of reading data -from the PFS in addition to placing data into Hermes. In this case, the -performance benefit of staging as compared to no staging is roughly 4x. - -# 2.07. Prefetching Benefit - -TODO. Place results here. - -# 2.08. Grey-Scott Model - -The Grey-Scott Model is a 3D 7-point stencil code for modeling reaction -diffusion. The model contains the following parameters - -| Key | Parameter | -|:--:|:--:| -| L | This size of the global array (An L x L x L cube) | -| steps | Total number of steps | -| plotgap | Number of steps between output | -| Du | Diffusion coefficient of U in the mathematical model | -| Dv | Diffusion coefficient of V in the mathematical model | -| F | Feed rate of U | -| k | Kill rate of V | -| dt | Timestep | -| noise | Amount of noise to inject | -| output | Where to output data to | -| adios_config | The ADIOS2 XML file | - -In our case, we use the following configuration: -| Key | Parameter | -|:--:|:--:| -| L | 128 | -| steps | 200,000 | -| plotgap | 16 | - - -# 2.10. Metadata Performance - -## 2.10.1. Create Bucket - -TODO. Place the results here. - -### Single-Node - -TODO. Place the results here. - -## 2.10.2. Get Bucket - -TODO. Place the results here. - -## 2.10.3. Create Blob - -TODO. Place the results here. - -## 2.10.4. Destroy Blob - -TODO. Place the results here. - -## 2.10.5. Destroy Bucket - -TODO. Place the results here. - -# 2.11. Other Use Cases - -### Benchmarks - -- [HPC IO Benchmark Repository](https://github.com/hpc/ior) -- [VPIC](https://github.com/lanl/vpic) -- [DLIO Benchmark](https://github.com/hariharan-devarajan/dlio_benchmark) - -### By Hand I/O Buffering - - - [Simulation and visualization of thunderstorms, tornadoes, and - downbursts](http://orf.media/) - - [LOFS: A simple file system for massively parallel cloud - models](https://www.youtube.com/watch?v=bD-9lK2pvqA&list=PLPyhR4PdEeGYzF3rx1KZDDOxitBDSnGes&index=3) - -### References - -[CORAL-2 Benchmarks](https://asc.llnl.gov/coral-2-benchmarks/) - -# 2.12. Note on Performance - -Hermes is not currently optimized for small I/O workloads -- especially -for the filesystem adapters. We are working on removing some of the locks and -adding metadata caching.