Skip to content

Building GridPACK on Cori

Sumathi Lakshmiranganatha edited this page Aug 25, 2022 · 3 revisions

First, copy the gpbuild folder into the home directory:
rsync -rl --info=progress2 /global/common/software/m3363/gpbuild ~
There shouldn't be any need to edit any of the files in ~/gpbuild after copying, as they should be setup to install into $HOME/gpbuild.

Build process

First, configure the environment:

cd ~/gpbuild
source module_csh

module list should print something like this:

  1) modules/                                  9) cray-libsci/19.06.1                              17) dvs/2.12_2.2.167-
  2) altd/2.0                                         10) udreg/2.3.2-           18) alps/6.6.58-
  3) darshan/3.2.1                                    11) ugni/         19) rca/2.2.20-
  4) craype-network-aries                             12) pmi/5.0.14                                       20) atp/2.1.3
  5) craype-haswell                                   13) dmapp/7.1.1-           21) PrgEnv-gnu/6.0.5
  6) cray-mpich/7.7.10                                14) gni-headers/  22) boost/1.69.0
  7) gcc/8.3.0                                        15) xpmem/2.2.20-          23) cmake/3.21.3
  8) craype/2.6.2                                     16) job/2.2.4-             24) git/2.21.0

1. GA build

cd ~/gpbuild/ga-5.8.1/
mkdir GA_shared
./configure --with-mpi-ts --disable-f77 --without-blas --enable-cxx --enable-i4 --prefix="$HOME/gpbuild/GA_shared" --enable-shared
make install

2. PETSc

cd ~/gpbuild/petsc-3.7.6
make clean
chmod +x ./build_csh
sbatch --wait build.job # Produces
make all

Modify build_csh script in Gridpack directory to update the library paths

rm -rf CMake*

export BOOST_DIR=/usr/common/software/boost/1.69.0/gnu/haswell
export BOOST_INC=$BOOST_DIR/include

CFLAGS='-L/opt/cray/xpmem/default/lib64 -lxpmem -L/opt/cray/ugni/default/lib64 -lugni -L/opt/cray/udreg/default/lib64 -ludreg -L/opt/cray/pe/pmi/default/lib64 -lpmi' \
cmake -Wdev \
-D BOOST_ROOT:STRING='/usr/common/software/boost/1.69.0/gnu/haswell' \
-D CMAKE_TOOLCHAIN_FILE:STRING=$HOME/gpbuild/GridPACK/src/build/ToolChain.cmake \
-D PETSC_DIR:STRING="$HOME/gpbuild/petsc-3.7.6" \
-D PETSC_ARCH:STRING='cori-gnu-cxx-complex-opt' \
-D CMAKE_INSTALL_PREFIX:PATH="$HOME/gpbuild/GridPACK/src/gridpack-install" \

3. GridPACK

cd $HOME/gpbuild/GridPACK/src/build
rm -rf ../gridpack-install
mkdir ../gridpack-install
make clean
rm -rf CMake*
make; make install

This will install GridPACK to $HOME/gpbuild/GridPACK/src/gridpack-install

3. GridPACK Python Wrapper

cd ~/gpbuild/GridPACK/python
rm -rf build/*
rm -rf *.so
rm -rf gridpack_hadrec.egg-info/
python build
mkdir $GRIDPACK_DIR/lib/python
python install --home="$GRIDPACK_DIR"


PETSc (seems to be working)

salloc --nodes 1 --qos interactive --time 01:00:00 --constraint haswell -A m3363
cd ~/gpbuild/petsc-3.7.6/src/snes/examples/tutorials
export PETSC_DIR=$HOME/gpbuild/petsc-3.7.6
export PETSC_ARCH=cori-gnu-cxx-complex-opt 
make ex1
make ex3

Output of srun -n 1 ./ex1 -ksp_gmres_cgs_refinement_type refine_always -snes_monitor_short

  0 SNES Function norm 6.04152 
  1 SNES Function norm 4.78676 
  2 SNES Function norm 2.98646 
  3 SNES Function norm 0.230624 
  4 SNES Function norm 0.00193631 
  5 SNES Function norm 1.43559e-07 
  6 SNES Function norm < 1.e-11
Number of SNES iterations = 6

Output of srun -n 3 ./ex3 -nox -pc_type asm -mat_type mpiaij -snes_monitor_cancel -snes_monitor_short -ksp_gmres_cgs_refinement_type refine_always:

 srun -n 3 ./ex3 -nox -pc_type asm -mat_type mpiaij -snes_monitor_cancel -snes_monitor_short -ksp_gmres_cgs_refinement_type refine_always 
atol=1e-50, rtol=1e-08, stol=1e-08, maxit=50, maxf=10000
  0 SNES Function norm 5.41468 
  1 SNES Function norm 0.295258 
  2 SNES Function norm 0.000450229 
  3 SNES Function norm 1.38967e-09 
Number of SNES iterations = 3
Norm of error 1.49751e-10 Iterations 3

GridPACK (seems to be working)

salloc --nodes 1 --qos interactive --time 01:00:00 --constraint haswell -A m3363
cd ~/gpbuild/GridPACK/src/build/applications/powerflow



### GridPACK math module configured on 3 processors                                
Maximum number of iterations: 50                                               
Convergence tolerance: 0.000001                                                
0: I have 46 buses and 59 branches                                             
1: I have 49 buses and 68 branches                                             
2: I have 45 buses and 63 branches                                             
 repeat time = 1                                                               
----------test Iteration 0, before PF solve, Tol: 9.638903e-02                 
  Residual norms for nrs_ solve.                                               
  0 KSP Residual norm 7.190983016757e-03                                       
  1 KSP Residual norm 1.347944633023e-03                                       
  2 KSP Residual norm 4.535626750145e-04                                       
  3 KSP Residual norm 2.917341728681e-04                                       
  4 KSP Residual norm 1.647921238149e-04                                       
  5 KSP Residual norm 7.897452726535e-05                                       
  6 KSP Residual norm 6.607054072768e-05                                       
  7 KSP Residual norm 5.371399518110e-05                                       
  8 KSP Residual norm 2.813124046284e-05                                       
  9 KSP Residual norm 1.594051825451e-05                                       
 10 KSP Residual norm 1.064548544847e-05                                       
 11 KSP Residual norm 8.490960299617e-06                                       
 12 KSP Residual norm 6.180701866024e-06                                       
 13 KSP Residual norm 4.020683912845e-06                                       
 14 KSP Residual norm 1.398606474076e-06                                       
 15 KSP Residual norm 8.626097626373e-07                                       
 16 KSP Residual norm 7.364346571727e-07                                       
 17 KSP Residual norm 6.848053732285e-07                                       
 18 KSP Residual norm 6.630520485578e-07                                       
 19 KSP Residual norm 6.560101273664e-07                                       
 20 KSP Residual norm 6.408068771526e-07                                       
 21 KSP Residual norm 5.938572278848e-07                                       
 22 KSP Residual norm 4.248234917688e-07                                       
 23 KSP Residual norm 2.594885010890e-07                                       
 24 KSP Residual norm 1.456851302217e-07                                       
 25 KSP Residual norm 7.026809525820e-08                                       
 26 KSP Residual norm 3.712644895932e-08                                       
 27 KSP Residual norm 1.436436895894e-08                                       
 28 KSP Residual norm 7.357363353615e-09                                       
KSP Object:(nrs_) 3 MPI processes                                              
  type: gmres                                                                  
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30                                     
  maximum iterations=50                                                        
  tolerances:  relative=1e-12, absolute=1e-08, divergence=10000.               
  left preconditioning                                                         
  using nonzero initial guess                                                  
  using PRECONDITIONED norm type for convergence test                          
PC Object:(nrs_) 3 MPI processes                                               
  type: bjacobi                                                                
    block Jacobi: number of blocks = 3                                         
    Local solve is same for all blocks, in the following KSP and PC objects:   
  KSP Object:  (nrs_sub_)   1 MPI processes                                    
    type: preonly                                                              
    maximum iterations=10000, initial guess is zero                            
    tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.             
    left preconditioning                                                       
    using NONE norm type for convergence test                                  
  PC Object:  (nrs_sub_)   1 MPI processes                                     
    type: ilu                                                                  
      ILU: out-of-place factorization                                          
      0 levels of fill                                                         
      tolerance for zero pivot 2.22045e-14                                     
      matrix ordering: natural                                                 
      factor fill ratio given 1., needed 1.                                    
        Factored matrix follows:                                               
          Mat Object:           1 MPI processes                                
            type: seqaij                                                       
            rows=71, cols=71                                                   
            package used to perform factorization: petsc                       
            total: nonzeros=453, allocated nonzeros=453                        
            total number of mallocs used during MatSetValues calls =0          
              using I-node routines: found 40 nodes, limit used is 5           
    linear system matrix = precond matrix:                                     
    Mat Object:     1 MPI processes                                            
      type: seqaij                                                             
      rows=71, cols=71                                                         
      total: nonzeros=453, allocated nonzeros=1065                             
      total number of mallocs used during MatSetValues calls =0                
        using I-node routines: found 40 nodes, limit used is 5                 
  linear system matrix = precond matrix:                                       
  Mat Object:   3 MPI processes                                                
    type: mpiaij                                                               
    rows=201, cols=201                                                         
    total: nonzeros=1347, allocated nonzeros=5908                              
    total number of mallocs used during MatSetValues calls =0                  
      using I-node (on process 0) routines: found 40 nodes, limit used is 5    
Iteration 1 Tol: 9.638903e-02                                                  
  Residual norms for nrs_ solve.                                               
  0 KSP Residual norm 3.313863268360e-05                                       
  1 KSP Residual norm 1.604771965118e-05                                       
  2 KSP Residual norm 1.199185979790e-05                                       
  3 KSP Residual norm 7.960609461082e-06                                       
  4 KSP Residual norm 6.033927451473e-06                                       
  5 KSP Residual norm 4.482037948986e-06                                       
  6 KSP Residual norm 1.262973288967e-06                                       
  7 KSP Residual norm 9.394684496620e-07                                       
  8 KSP Residual norm 6.955684736309e-07                                       
  9 KSP Residual norm 4.966270714777e-07                                       
 10 KSP Residual norm 3.801378639361e-07                                       
 11 KSP Residual norm 3.159321420023e-07

GridPACK Python Wrapper (seems to be working)

salloc --nodes 1 --qos interactive --time 01:00:00 --constraint haswell -A m3363
cd ~/gpbuild/GridPACK/python

PYTHONPATH="${GRIDPACK_DIR}/lib/python/gridpack_hadrec-0.0.1-py3.9-linux-x86_64.egg:${PYTHONPATH}” // add egg’s path to pythonpath 
export PATH=$HOME/gpbuild/GridPACK/src/gridpack-install/bin:$PATH

Output of srun -n 3 python src/ (seems to be working) hello from process 0 of 3 hello from process 1 of 3 hello from process 2 of 3

GridPACK math module configured on 3 processors

Output of srun -n 3 python src/ (seems to be working)

process 0 of 3 executing task 0
process 0 of 3 executing task 3
process 0 of 3 executing task 6
process 0 of 3 executing task 9
process 0 of 3 executing task 12
process 0 of 3 executing task 15
process 0 of 3 executing task 18
process 0 of 3 executing task 21
process 0 of 3 executing task 24
process 0 of 3 executing task 27
process 0 of 3 executing task 30
process 0 of 3 executing task 33
process 0 of 3 executing task 36
process 0 of 3 executing task 39
process 0 of 3 executing task 42
process 0 of 3 executing task 45
process 0 of 3 executing task 48
process 0 of 3 executing task 51
process 0 of 3 executing task 54
process 0 of 3 executing task 57
process 0 of 3 executing task 60
process 0 of 3 executing task 63
process 0 of 3 executing task 66
process 0 of 3 executing task 69
process 0 of 3 executing task 72
process 0 of 3 executing task 75
process 0 of 3 executing task 78
process 0 of 3 executing task 81
process 0 of 3 executing task 84
process 0 of 3 executing task 87
process 0 of 3 executing task 90
process 0 of 3 executing task 93
process 0 of 3 executing task 96
process 0 of 3 executing task 99
process 1 of 3 executing task 2
process 1 of 3 executing task 4
process 1 of 3 executing task 8
process 1 of 3 executing task 11
process 1 of 3 executing task 14
process 1 of 3 executing task 17
process 1 of 3 executing task 20
process 1 of 3 executing task 23
process 1 of 3 executing task 26
process 1 of 3 executing task 29
process 1 of 3 executing task 32
process 1 of 3 executing task 35
process 1 of 3 executing task 38
process 1 of 3 executing task 41
process 1 of 3 executing task 44
process 1 of 3 executing task 47
process 1 of 3 executing task 49
process 1 of 3 executing task 52
process 1 of 3 executing task 55
process 1 of 3 executing task 58
process 1 of 3 executing task 61
process 1 of 3 executing task 64
process 1 of 3 executing task 67
process 1 of 3 executing task 70
process 1 of 3 executing task 73
process 1 of 3 executing task 76
process 1 of 3 executing task 80
process 1 of 3 executing task 83
process 1 of 3 executing task 86
process 1 of 3 executing task 89
process 1 of 3 executing task 92
process 1 of 3 executing task 95
process 1 of 3 executing task 98
process 2 of 3 executing task 1
process 2 of 3 executing task 5
process 2 of 3 executing task 7
process 2 of 3 executing task 10
process 2 of 3 executing task 13
process 2 of 3 executing task 16
process 2 of 3 executing task 19
process 2 of 3 executing task 22
process 2 of 3 executing task 25
process 2 of 3 executing task 28
process 2 of 3 executing task 31
process 2 of 3 executing task 34
process 2 of 3 executing task 37
process 2 of 3 executing task 40
process 2 of 3 executing task 43
process 2 of 3 executing task 46
process 2 of 3 executing task 50
process 2 of 3 executing task 53
process 2 of 3 executing task 56
process 2 of 3 executing task 59
process 2 of 3 executing task 62
process 2 of 3 executing task 65
process 2 of 3 executing task 68
process 2 of 3 executing task 71
process 2 of 3 executing task 74
process 2 of 3 executing task 77
process 2 of 3 executing task 79
process 2 of 3 executing task 82
process 2 of 3 executing task 85
process 2 of 3 executing task 88
process 2 of 3 executing task 91
process 2 of 3 executing task 94
process 2 of 3 executing task 97

GridPACK math module configured on 3 processors

Output of srun python test (seems to be working)

running test
Searching for nose
Best match: nose 1.3.7
Processing nose-1.3.7-py3.7.egg

Using /global/u2/t/tflynn/gpbuild/GridPACK/python/.eggs/nose-1.3.7-py3.7.egg
running egg_info
writing gridpack_hadrec.egg-info/PKG-INFO
writing dependency_links to gridpack_hadrec.egg-info/dependency_links.txt
writing top-level names to gridpack_hadrec.egg-info/top_level.txt
reading manifest file 'gridpack_hadrec.egg-info/SOURCES.txt'
reading manifest template ''
writing manifest file 'gridpack_hadrec.egg-info/SOURCES.txt'
running build_ext
-- Cray Programming Environment 2.6.2 C
-- NOTE: LOADEDMODULES changed since initial config!
-- NOTE: this may cause unexpected build errors.
-- Cray Programming Environment 2.6.2 CXX
-- pybind11 v2.4.3
-- Configuring done
-- Generating done
-- Build files have been written to: /global/u2/t/tflynn/gpbuild/GridPACK/python/build/temp.linux-x86_64-3.7
[  0%] Built target parallel_scripts
Consolidate compiler generated dependencies of target gridpack
[ 50%] Linking CXX shared module ../../../
[100%] Built target gridpack
hadrec_test (tests.gridpack_test.GridPACKTester) ... ok
hello_test (tests.gridpack_test.GridPACKTester) ... ok
task_test (tests.gridpack_test.GridPACKTester) ... ok

Ran 3 tests in 4.924s


GridPACK math module configured on 1 processors


Make a new conda environment:

conda create --name gpenv --clone base
conda activate gpenv
conda install -c conda-forge gym 
pip install xmltodict
salloc --nodes 1 --qos interactive --time 01:00:00 --constraint haswell -A m3363
cd ~/powerGridEnv/src 
conda activate gpenv

Output of srun python (seems to be working)

/global/homes/t/tflynn/.conda/envs/gpenv/lib/python3.7/site-packages/gym/spaces/ UserWarning: WARN: Casting input x to numpy array.
  logger.warn("Casting input x to numpy array.")
[(0, 0, 1.0, 0.1), (0, 1, 1.0, 0.1), (0, 2, 1.0, 0.1), (0, 3, 1.0, 0.1), (0, 4, 1.0, 0.1), (0, 5, 1.0, 0.1), (0, 6, 1.0, 0.1), (0, 7, 1.0, 0.1), (0, 8, 1.0, 0.1)]
-----------------root path of the rlgc: /global/u2/t/tflynn/powerGridEnv
!!!!!!!!!-----------------start the env
------------------- Total steps: 50, Episode total reward without any load shedding actions:  -4601.344304438557
------------------- Total steps: 80, Episode total reward with manually provided load shedding actions:  -1472.2004908672507
volt_ob_noact.shape:  (51, 4)
--------- GridPACK HADREC APP MODULE deallocated ----------
!!!!!!!!!-----------------finished gridpack env testing

Output of srun python (seems to be working)

/global/homes/t/tflynn/.conda/envs/gpenv/lib/python3.7/site-packages/gym/spaces/ UserWarning: WARN: Casting input x to numpy array.
  logger.warn("Casting input x to numpy array.") MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
  plt.subplot(121) MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
[(0, 0, 1.0, 0.1), (0, 1, 1.0, 0.1), (0, 2, 1.0, 0.1), (0, 3, 1.0, 0.1), (0, 4, 1.0, 0.1), (0, 5, 1.0, 0.1), (0, 6, 1.0, 0.1), (0, 7, 1.0, 0.1), (0, 8, 1.0, 0.1)]
-----------------root path of the rlgc: /global/u2/t/tflynn/powerGridEnv
!!!!!!!!!-----------------start the env
finished loading npz file
ob_dim:  142 ac_dim:  34
finished loading the weights to the policy network
Fault tuple is  (0, 0, 1.0, 0.08)
-----one episode testing finished without AI-provided actions, total steps: 80 total reward:  -3623.717146067013
------------------- Episode total reward with AI provided actions:  -3623.717146067013
Fault tuple is  (0, 0, 1.0, 0.08)
-----one episode testing finished without any load shedding, total steps: 61 total reward:  -14496.355926569628
------------------- Episode total reward without any action:  -14496.355926569628
--------- GridPACK HADREC APP MODULE deallocated ----------
!!!!!!!!!-----------------finished gridpack env testing

Running the RL code on one node

Note that to get the script working on Cori, we had to make a few changes. See the branch Cori/powerGridEnv of this repo for a working version. These changes seem to have been needed due for compatibility with the version of Ray that was available on Cori at the time of running this code.

salloc --nodes 1 --qos interactive --time 01:00:00 --constraint haswell -A m3363
conda activate gpenv
srun python --cores 6 --n_iter 20

Partial output:

2022-01-27 19:41:13,564 INFO -- View the Ray dashboard at
-----------------root path of the rlgc: /global/u2/t/tflynn/powerGridEnv
Trying to init with ip None and pw None...
Did the init!
This cluster consists of
        1 nodes in total
        64.0 CPU resources in total

Logging data to outputs_training/ars_39_bussys_1_pf_5_faultbus_1_dur_lstm_gridpack_v6/log.txt
Creating deltas table.
Created deltas table.
Initializing multidirection workers.
just set self.num_workers to 1

------------!!!! workers allocation:

------------!!!! total cores: 6  , total directions: 32 ,  onedirection_numofcasestorun:  5
------------!!!! num_workers: 1 ,  repeat:  32 , remain: 0


Initializing policy.
Initializing optimizer.
Initialization of ARS complete.
Total Time Initialize ARS: 11.715039491653442
select_faultbuses_id:   [4 3 0 2 1]
select_pfcases_id:   [0]
select_fault_cases_tuples:   [(0, 4, 1.0, 0.08), (0, 3, 1.0, 0.08), (0, 0, 1.0, 0.08), (0, 2, 1.0, 0.08), (0, 1, 1.0, 0.08)]
rollout_rewards shape: (32, 2)
deltas_idx shape: (32,)
Maximum reward of collected rollouts: -1674.5928014215574
---Time to print rollouts results:  0.00016307830810546875
---Full Time to generate rollouts:  25.364495038986206

----time to aggregate rollouts:  0.018156766891479492
Euclidean norm of update step: 33.69602337952102
g_hat shape, w_policy shape: (6112,) (6112,)
total time of one step 25.386450052261353
iter  0  done
[('cores', 6), ('decay', 0.9985), ('delta_std', 2), ('deltas_used', 16), ('dir_path', 'outputs_training/ars_39_bussys_1_pf_5_faultbus_1_dur_lstm_gridpack'), ('n_directions', 32), ('n_iter', 20), ('onedirection_numofcasestorun', 5), ('policy_file', ''), ('policy_network_size', [32, 32]), ('policy_type', 'LSTM'), ('rollout_length', 90), ('save_per_iter', 10), ('seed', 589), ('step_size', 1), ('tol_p', 0.001), ('tol_steps', 100)]
|            Time |            25.6 |
|       Iteration |               1 |
|   AverageReward |       -3.87e+03 |
|       reward 0: |       -3.94e+03 |
|       reward 1: |       -3.77e+03 |
|       reward 2: |       -3.93e+03 |
|       reward 3: |       -3.93e+03 |
|       reward 4: |       -3.76e+03 |
|       timesteps |        5.12e+03 |
total time of save: 0.20161938667297363
select_faultbuses_id:   [2 3 4 0 1]
select_pfcases_id:   [0]
select_fault_cases_tuples:   [(0, 2, 1.0, 0.08), (0, 3, 1.0, 0.08), (0, 4, 1.0, 0.08), (0, 0, 1.0, 0.08), (0, 1, 1.0, 0.08)]
rollout_rewards shape: (32, 2)
deltas_idx shape: (32,)
Maximum reward of collected rollouts: -1380.3128771101492
---Time to print rollouts results:  0.00010728836059570312
---Full Time to generate rollouts:  11.247551679611206

----time to aggregate rollouts:  0.0005881786346435547
Euclidean norm of update step: 35.93837568283732
g_hat shape, w_policy shape: (6112,) (6112,)
total time of one step 11.251289129257202
iter  1  done
total time of save: 1.430511474609375e-06
select_faultbuses_id:   [3 1 0 4 2]
select_pfcases_id:   [0]
select_fault_cases_tuples:   [(0, 3, 1.0, 0.08), (0, 1, 1.0, 0.08), (0, 0, 1.0, 0.08), (0, 4, 1.0, 0.08), (0, 2, 1.0, 0.08)]
rollout_rewards shape: (32, 2)
deltas_idx shape: (32,)
Maximum reward of collected rollouts: -1046.8336988165634
---Time to print rollouts results:  0.00015091896057128906
---Full Time to generate rollouts:  12.794855833053589

----time to aggregate rollouts:  0.0006389617919921875
Euclidean norm of update step: 36.19825363812717
g_hat shape, w_policy shape: (6112,) (6112,)
total time of one step 12.798492431640625
iter  2  done
total time of save: 9.5367431640625e-07
select_faultbuses_id:   [3 4 2 1 0]
select_pfcases_id:   [0]
select_fault_cases_tuples:   [(0, 3, 1.0, 0.08), (0, 4, 1.0, 0.08), (0, 2, 1.0, 0.08), (0, 1, 1.0, 0.08), (0, 0, 1.0, 0.08)]
rollout_rewards shape: (32, 2)
deltas_idx shape: (32,)
Maximum reward of collected rollouts: -1800.67744519864
---Time to print rollouts results:  0.0001068115234375
---Full Time to generate rollouts:  11.743767976760864

----time to aggregate rollouts:  0.0005624294281005859
Euclidean norm of update step: 34.23786550032807
g_hat shape, w_policy shape: (6112,) (6112,)
total time of one step 11.747586965560913
iter  3  done
total time of save: 9.5367431640625e-07
select_faultbuses_id:   [0 3 2 4 1]
select_pfcases_id:   [0]
select_fault_cases_tuples:   [(0, 0, 1.0, 0.08), (0, 3, 1.0, 0.08), (0, 2, 1.0, 0.08), (0, 4, 1.0, 0.08), (0, 1, 1.0, 0.08)]
rollout_rewards shape: (32, 2)

Running the RL code on multiple nodes

Our scripts for running on multiple nodes are in the ./cori folder in this repo. They are based on the example Ray submission scripts from here The only changes to the and scripts from the example are that we need to do conda activate gpenv in before executing the ray commands. See the commit 66bcbcc7a2f1777dcc308d966cc6d35677e09efd of this repository for a working configuration.

cd ~/powerGridEnv/cori


  1. Official GridPACK docs:
  2. Software Dependencies: