- PyTorch (version >= 1.10.0)
- numpy
- pandas
- Install pytorch-sparse:
git clone [link in my github repo]
cd pytorch-sparse && git checkout a64fx_dev && python setup.py install
- Install pytorch-scatter:
git clone [link in my github repo]
cd pytorch-scatter && git checkout a64fx_dev && python setup.py install
- Install pytorch-geometric:
git clone [link in my github repo]
cd pytorch-geometric && git checkout zhuang_dev && python setup.py install
- Install ParMetis for graph partition
- please follow the instructions provided by subsection ParMETIS Installation in the https://docs.dgl.ai/en/0.9.x/guide/distributed-partition.html#
- Install this framework:
git clone [link]
- Install kernel
cd super_gnn/ops && python setup.py install
- Install framework
cd ../../ && python setup.py install
- Preprocess raw graph data:
cd super_gnn/graph_partition/
-
python preprocess_graph.py --dataset=${graph_name} --raw_dir=./dataset/ --processed_dir=${processed_dir} --is_undirected
-
--dataset
is the name of dataset, option: [ogbn-arxiv, ogbn-products, reddit, proteins, ogbn-papers100M, ogbn-mag240M] -
--raw_dir
is the root directory for saving the raw dataset -
--processed_dir
is the directory for saving the preprocessed dataset that will be used by later graph partition -
--is_undirected
make the raw graph dataset to be undirected (for directed graph)
-
- Partition graph with ParMetis
cd ${processed_dir}
- make sure you have set two environment variables for ParMetis:
export PATH=$PATH:$HOME/local/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/local/lib/
-
mpirun -np ${num_procs} pm_dglpart ${graph_name} ${num_part_each_proc}
-
num_procs
is the number of MPI processes for graph partition -
num_part_each_proc
is the number of subgraphs each MPI processes generates. So the total number of subgraph after graph partition isnum_procs
*num_part_each_proc
-
graph_name
is the name of dataset, option: [ogbn_arxiv, ogbn_products, reddit, proteins, ogbn_papers100M, ogbn_mag240M_paper_cites_paper]
-
- With the previous 3 steps, you will get the graph partition results in ${processed_dir}.
- Postprocess graph partition result
- Back to directory
super_gnn/graph_partition/
,python postprocess_graph_multi_proc.py -o ${out_dir} -ir ${in_raw_dir} -ip ${in_partition_dir} -g ${graph_name} -b 0 -e ${total_num_procs} -p ${num_process}
-
out_dir
: the directory for saving the postprocessed dataset. the name of this folder must to be set as: ${graph_name}_${num_of_subgraphs}_part/ -
in_raw_dir
: the directory of saving the preprocessed dataset -
in_partition_dir
: the directory of saving the graph partition results (${processed_dir}) -
graph_name
is the name of dataset, option: [ogbn_arxiv, ogbn_products, reddit, proteins, ogbn_papers100M, ogbn_mag240M_paper_cites_paper] -
b
: is the id of beginning subgraphs, default: 0 -
e
: is the id of ending subgraphs, default:total_num_subgraphs
-
num_process
: the number of processes spawned for postprocessing
-
- Back to directory
- With the previous 3 steps, graph partitioning for getting a specific number of subgraphs is over. if you want to get the other number of subgraphs, you need to repeat step 2 and step 3 for graph partition, change the value of
num_procs
*num_part_each_proc
.
- partition the graph with the instruction provided in subsection graph partition.
- back to top directory of this project, then
cd examples/graphsage/
- change the input data dir: by modifying the yaml files in
config/fugaku/
. the input_dir must to be set as the father directory ofout_dir
indicated in the previous (postprocess) step . - also you can change the model hyperparameters by modifying the yaml files.
- back to the
examples/graphsage/
, then use following command to run the full-batch graphsage training:-
mpirun -np ${num_procs} python train.py --config=./config/fugaku/${graph_name}.yaml
-
-np (num_procs)
: number of MPI processes for training -
--config
: the training config file located atconfig/fugaku/
-
--num_bits
: number of bits for boundary node communication, option: [32, 16, 8, 4, 2], default:32
-
--is_pre_delay
: use pre-post aggregation for communication, option: [true, false], default:false
-
--is_label_augment
: use label augmentation, option: [true, false], default:false
-
- Example for running full-batch graphsage training on ogbn-product dataset using 32 mpi processes, int2 for communication, enable pre-post aggregation for communication, and label augmentation:
mpirun -np 32 python train.py --config=./config/fugaku/ogbn-products.yaml --num_bits=2 --is_pre_delay=true --is_label_augment=true
-
- partition the graph with the instruction provided in subsection graph partition.
- change the input data dir: by modifying the yaml files in
config/fugaku/
. the input_dir must to be set as the father directory ofout_dir
indicated in the previous (postprocess) step. - back to the top directory of this project. then
bash submit_scripts_at_fugaku/${graph_name}/submit_scripts/submit_${graph_name}_all.sh
. replace the ${graph_name} with the graph name. option: [ogbn-arxiv, ogbn-products, reddit, proteins, ogbn-papers100M, ogbn-mag240M_paper_cites_paper]
Since we update the code a lot, the result of this repo might not match with the result in our paper. To reproduce the result of ABCI in our paper, please refer to the original repo: empty link
- partition the graph with the instruction provided in subsection graph partition.
- change the input data dir: by modifying the yaml files in
config/abci/
. the input_dir must to be set as the father directory ofout_dir
indicated in the previous (postprocess) step. - back to the top directory of this project. then
bash submit_scripts_at_abci/batch_job_submission/submit_${graph_name}_all.sh
. replace the ${graph_name} with the graph name. option: [ogbn-arxiv, ogbn-products, reddit, proteins, ogbn-papers100M, ogbn-mag240M_paper_cites_paper]