Skip to content

Latest commit

 

History

History
127 lines (78 loc) · 6.26 KB

ComDetective.md

File metadata and controls

127 lines (78 loc) · 6.26 KB

ComDetective

Requirement

  • Linux kernel version 5.0.0 or higher
  • To run perf_event_open system call without having to use sudo access, set the value of perf_event_paranoid to -1 by typing the following command: sudo sysctl -w kernel.perf_event_paranoid=-1

Installation

  1. (In AMD) Install the Linux kernel module for IBS from https://github.com/ParCoreLab/AMD_IBS_Toolkit

  2. Install hpctoolkit-externals from https://github.com/WitchTools/hpctoolkit-externals by typing the following command in the directory of hpctoolkit-externals: ./configure && make && make install

  3. Install the custom libmonitor from https://github.com/WitchTools/libmonitor by typing the following command in the directory of libmonitor: ./configure --prefix=<libmonitor-installation directory> && make && make install

  4. Install HPCToolkit with ComDetective extensions from https://github.com/ParCoreLab/hpctoolkit pointing to the installations of hpctoolkit-externals and libmonitor from steps #1 and #2. Assuming that the underlying architecture is x86_64 and compiler is gcc, this step is performed with the following commands.

    a. ./configure --prefix=<targeted installation directory for ComDetective> --with-externals=<directory of hpctoolkit externals>/x86_64-unknown-linux-gnu --with-libmonitor=<libmonitor-installation directory>

    b. make

    c. make install

Usage on Intel

  1. To run ComDetective with default configuration (sampling period: 500K, bulletin board size: 127, number of watchpoints: 4, and name of output folder "_timestamped_results"):

ComDetectiverun <./your_executable> your_args

  1. To run ComDetective with custom configuration (user-chosen sampling period, bulletin board size, number of watchpoints, minimum size of data objects to be detected, and name of output folder):

ComDetectiverun --period <sampling_period> --bulletin-board-size --debug-register-size --object-size-threshold --output <./your_executable> your_args

or

ComDetectiverun -p <sampling_period> -b -d -t -o <./your_executable> args_for_executable

  1. To monitor a program that has multiple processes (e.g. an MPI program):

mpirun -n ComDetectiverun <./your_executable> your_args

Usage on AMD

  1. To run ComDetective in an AMD machine with user-chosen sampling period:

hpcrun -e WP_AMD_COMM -e IBS_OP@<sampling_period> -o <name_of_output_folder> <./your_executable> your_args

Attribution to Locations in Source Code

  1. Compile the code that you want to profile using "-g" flag to allow for debugging.

  2. To attribute the detected communications to their locations in source code lines and program stacks, you need to take the following steps:

a. Download and extract a binary release of hpcviewer from http://hpctoolkit.org/download/hpcviewer/latest/hpcviewer-linux.gtk.x86_64.tgz

b. Run ComDetective on a program to be profiled

In Intel:

ComDetectiverun --output <./your_executable> your_args

In AMD:

hpcrun -e WP_AMD_COMM -e IBS_OP@<sampling_period> -o <name_of_output_folder> <./your_executable> your_args

c. Extract the static program structure from the profiled program by using hpcstruct

hpcstruct <./your_executable>

The output of hpcstruct is <./your_executable>.hpcstruct.

d. Generate an experiment result database using hpcprof

hpcprof -S <./your_executable>.hpcstruct -o

The output of hpcprof is a folder named .

e. Use hpcviewer to read the content of the experiment result database in a GUI interface

hpcviewer/hpcviewer

Information on program stack and source code lines is available in the Scope column, and information about communication counts detected on the corresponding program stack and source code lines is available under "COMMUNICATION:Sum (I)" column.

Communication Matrices and Communication Ranks of Data Objects

Communication matrices and ranking of data objects are dumped to the output folder. If you don't pass a name for the output folder with "--output" or "-o" parameter, the name of the output folder is "<timestamp>_timestamped_results".

Each application level matrix file is named as follow: <executable name>-<pid of the process>-<matrix type>_matrix.csv, while data object level matrix file is named as follow: <executable name>-<pid of the process>-<object id>-<matrix type>_matrix_rank_<object rank>.csv.

<matrix type> can be "as" for any communication among threads, "ts" for true sharing among threads, "fs" for false sharing among threads, "as_core" for any communication among cores, "fs_core" for false sharing among cores, or "ts_core" for true sharing among cores. <object id> is associated with the corresponding data object's name in file <executable name>-<pid of the process>-<matrix type>_object_ranking.txt. In this txt file, all data objects are ranked with respect to the counts of communication whose type is indicated by the <matrix type>. Total counts of communications are printed in the log file named <executable name>-*.log within the output folder.

Attribution of Communications to Data Objects

Please note that if you enable attribution of communications to data objects by following the instructions in ComDetective.Install, and try to detect every single dynamic memory allocation by passing "-t 0" parameter to ComDetective, the profiled program can run slowly.

The reason for this is that if the profiled program calls dynamic memory allocation functions a lot of times (like millions of mallocs), ComDetective can be slowed down as it intercepts these function calls and inserts info on the allocated memory ranges to its database of dynamic objects. To get around this problem, ComDetective's user can accelerate ComDetective by restricting it to detect only large objects. For example, instead of capturing every single dynamic memory allocation (with command line parameter "-t 0"), ComDetective can be restricted to capture only dynamic memory allocations with size 10000 bytes or higher (with command line parameter "-t 10000").