Sylvie: 3D-adaptive and Universal System for Large-scale Graph Neural Network Training

Directory Structure

|-- checkpoint   # model checkpoints
|-- dataset
|-- helper       # auxiliary codes
|-- module       # PyTorch modules
|-- partitions   # partitions of input graphs
|-- results      # experiment outputs
|-- scripts      # example scripts

Note that ./checkpoint/, ./dataset/, ./partitions/ and ./results/ are empty folders at the beginning and will be created when training is launched.

Setup

Environment

Hardware Dependencies

A CPU machine with at least 120 GB host memory
At least five Nvidia GPUs (at least 24 GB each)

Software Dependencies

Ubuntu 20.04
Python 3.9
CUDA 11.7
PyTorch 1.10
customized DGL 0.9.0
OGB 1.3.4

Installation

Run with Docker

We have prepared a Docker image(comming soon) for Sylvie.

docker pull zxmeng98/sylvie
docker run --gpus all -it zxmeng98/sylvie

Datasets

We use Reddit, ogbn-products, Yelp and Amazon for evaluations. All datasets are supposed to be stored in ./dataset/. Reddit, ogbn-products and ogbn-papers100M will be downloaded by DGL or OGB automatically. Yelp is preloaded in the Docker environment, and is available here.

Basic Usage

Core Training Options

--dataset: the dataset you want to use
--model: the model to use
--n-hidden: the number of hidden units
--n-layers: the number of model layers
--n-partitions: the number of partitions
--master-addr: the address of master server
--port: the network port for communication

For example, after running bash scripts/reddit.sh, you will get the output like this

...
Process 002 | Epoch 00079 | Time(s) 0.7814 | Comm(s) 0.6886 | Reduce(s) 0.0415 | Loss 0.2291
Process 003 | Epoch 00079 | Time(s) 0.7816 | Comm(s) 0.6784 | Reduce(s) 0.0433 | Loss 0.3579
Process 000 | Epoch 00079 | Time(s) 0.7804 | Comm(s) 0.6784 | Reduce(s) 0.0422 | Loss 0.2293
Process 001 | Epoch 00079 | Time(s) 0.7816 | Comm(s) 0.6878 | Reduce(s) 0.0411 | Loss 0.0932
Epoch 00079 | Accuracy 93.31%
...

Run Experiments

To reproduce experiments of our paper (e.g., throughput and accuracy in Table 4 and 5), please run scripts/reddit.sh, scripts/ogbn-products.sh or scripts/yelp.sh. Users can adjust the options to reproduce results of other settings. The outputs will be saved to ./results/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
fig		fig
helper		helper
module		module
quant_part		quant_part
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
myconfig.py		myconfig.py
plot_results.ipynb		plot_results.ipynb
requirements.txt		requirements.txt
setup.sh		setup.sh
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sylvie: 3D-adaptive and Universal System for Large-scale Graph Neural Network Training

Directory Structure

Setup

Environment

Hardware Dependencies

Software Dependencies

Installation

Run with Docker

Datasets

Basic Usage

Core Training Options

Run Experiments

About

Releases

Packages

Languages

License

zxmeng98/Sylvie

Folders and files

Latest commit

History

Repository files navigation

Sylvie: 3D-adaptive and Universal System for Large-scale Graph Neural Network Training

Directory Structure

Setup

Environment

Hardware Dependencies

Software Dependencies

Installation

Run with Docker

Datasets

Basic Usage

Core Training Options

Run Experiments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages