|-- checkpoint # model checkpoints
|-- dataset
|-- helper # auxiliary codes
|-- module # PyTorch modules
|-- partitions # partitions of input graphs
|-- results # experiment outputs
|-- scripts # example scripts
Note that ./checkpoint/
, ./dataset/
, ./partitions/
and ./results/
are empty folders at the beginning and will be created when training is launched.
- A CPU machine with at least 120 GB host memory
- At least five Nvidia GPUs (at least 24 GB each)
- Ubuntu 20.04
- Python 3.9
- CUDA 11.7
- PyTorch 1.10
- customized DGL 0.9.0
- OGB 1.3.4
We have prepared a Docker image(comming soon) for Sylvie.
docker pull zxmeng98/sylvie
docker run --gpus all -it zxmeng98/sylvie
We use Reddit, ogbn-products, Yelp and Amazon for evaluations. All datasets are supposed to be stored in ./dataset/
. Reddit, ogbn-products and ogbn-papers100M will be downloaded by DGL or OGB automatically. Yelp is preloaded in the Docker environment, and is available here.
--dataset
: the dataset you want to use--model
: the model to use--n-hidden
: the number of hidden units--n-layers
: the number of model layers--n-partitions
: the number of partitions--master-addr
: the address of master server--port
: the network port for communication
For example, after running bash scripts/reddit.sh
, you will get the output like this
...
Process 002 | Epoch 00079 | Time(s) 0.7814 | Comm(s) 0.6886 | Reduce(s) 0.0415 | Loss 0.2291
Process 003 | Epoch 00079 | Time(s) 0.7816 | Comm(s) 0.6784 | Reduce(s) 0.0433 | Loss 0.3579
Process 000 | Epoch 00079 | Time(s) 0.7804 | Comm(s) 0.6784 | Reduce(s) 0.0422 | Loss 0.2293
Process 001 | Epoch 00079 | Time(s) 0.7816 | Comm(s) 0.6878 | Reduce(s) 0.0411 | Loss 0.0932
Epoch 00079 | Accuracy 93.31%
...
To reproduce experiments of our paper (e.g., throughput and accuracy in Table 4 and 5), please run scripts/reddit.sh
, scripts/ogbn-products.sh
or scripts/yelp.sh
. Users can adjust the options to reproduce results of other settings. The outputs will be saved to ./results/
directory.