Skip to content

zxmeng98/Sylvie

Repository files navigation

Sylvie: 3D-adaptive and Universal System for Large-scale Graph Neural Network Training

Directory Structure

|-- checkpoint   # model checkpoints
|-- dataset
|-- helper       # auxiliary codes
|-- module       # PyTorch modules
|-- partitions   # partitions of input graphs
|-- results      # experiment outputs
|-- scripts      # example scripts

Note that ./checkpoint/, ./dataset/, ./partitions/ and ./results/ are empty folders at the beginning and will be created when training is launched.

Setup

Environment

Hardware Dependencies

  • A CPU machine with at least 120 GB host memory
  • At least five Nvidia GPUs (at least 24 GB each)

Software Dependencies

Installation

Run with Docker

We have prepared a Docker image(comming soon) for Sylvie.

docker pull zxmeng98/sylvie
docker run --gpus all -it zxmeng98/sylvie

Datasets

We use Reddit, ogbn-products, Yelp and Amazon for evaluations. All datasets are supposed to be stored in ./dataset/. Reddit, ogbn-products and ogbn-papers100M will be downloaded by DGL or OGB automatically. Yelp is preloaded in the Docker environment, and is available here.

Basic Usage

Core Training Options

  • --dataset: the dataset you want to use
  • --model: the model to use
  • --n-hidden: the number of hidden units
  • --n-layers: the number of model layers
  • --n-partitions: the number of partitions
  • --master-addr: the address of master server
  • --port: the network port for communication

For example, after running bash scripts/reddit.sh, you will get the output like this

...
Process 002 | Epoch 00079 | Time(s) 0.7814 | Comm(s) 0.6886 | Reduce(s) 0.0415 | Loss 0.2291
Process 003 | Epoch 00079 | Time(s) 0.7816 | Comm(s) 0.6784 | Reduce(s) 0.0433 | Loss 0.3579
Process 000 | Epoch 00079 | Time(s) 0.7804 | Comm(s) 0.6784 | Reduce(s) 0.0422 | Loss 0.2293
Process 001 | Epoch 00079 | Time(s) 0.7816 | Comm(s) 0.6878 | Reduce(s) 0.0411 | Loss 0.0932
Epoch 00079 | Accuracy 93.31%
...

Run Experiments

To reproduce experiments of our paper (e.g., throughput and accuracy in Table 4 and 5), please run scripts/reddit.sh, scripts/ogbn-products.sh or scripts/yelp.sh. Users can adjust the options to reproduce results of other settings. The outputs will be saved to ./results/ directory.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published