-
Notifications
You must be signed in to change notification settings - Fork 62
tutorials quick start
Standalone Mode Quick Start Tutorial#
GraphStorm provides a set of tools, which can help users to use built-in datasets as examples to quickly learn the general steps of using GraphStorm.
GraphStorm is designed for easy-to-use GML models, particularly the graph neural network (GNN) models. Users only need to perform three operations:
-
Prepare Graph dataset in the required format as inputs of GraphStorm;
-
Launch GraphStorm training scripts and save the best models;
-
Launch GraphStorm inference scripts with saved models to predict.
This tutorial will use GraphStorm’s built-in OGB-arxiv dataset for a node classification task to demonstrate these three steps.
Note
All commands below are designed to run in a GraphStorm Docker container. Please refer to the GraphStorm Docker environment setup to prepare your environment.
If you set up the GraphStorm environment with pip Packages, please replace all occurrences of “2222” in the argument --ssh-port
with 22, and clone GraphStorm toolkits.
Download and Partition OGB-arxiv Data#
First run the below command.
python3 /graphstorm/tools/partition_graph.py --dataset ogbn-arxiv \ --filepath /tmp/ogbn-arxiv-nc/ \ --num-parts 1 \ --output /tmp/ogbn_arxiv_nc_1p
This command will automatically download ogbn-arxiv graph data and split the graph into one partition. Outcomes of the command are a set of files saved in the /tmp/ogbn_arxiv_nc_1p/
folder, as shown below.
/tmp/ogbn_arxiv_nc_1p: ogbn-arxiv.json node_mapping.pt edge_mapping.pt |- part0: edge_feat.dgl graph.dgl node_feat.dgl
The ogbn-arxiv.json
file contains meta data about the built distributed DGL graph. Because the command specifies to create one partition with the argument --num-parts 1
, there is one sub-folder, named part0
. Files in the sub-folder includes three types of data, i.e., the graph structure (graph.dgl
), the node features (node_feat.dgl
), and edge features (edge_feat.dgl
). The node_mapping.pt
and edge_mapping.pt
contain the ID mapping between the raw node and edge IDs with the built graph’s node and edge IDs.
Launch Training#
GraphStorm currently relies on ssh to launch its scripts. Therefore before launch any scripts, users need to create an IP address file, which contains all private IP addresses in a cluster. If run GraphStorm in the Standalone mode, which run only in a single machine, as this tutorial does, users only need to run the following command to create an ip_list.txt
file that has one row ‘127.0.0.1’ as its content.
touch /tmp/ogbn-arxiv-nc/ip_list.txt echo 127.0.0.1 > /tmp/ogbn-arxiv-nc/ip_list.txt
Then run the below command to start a training job that trains an built-in RGCN model to perform node classification on the OGB-arxiv.
python3 -m graphstorm.run.gs_node_classification \ --workspace /tmp/ogbn-arxiv-nc \ --num-trainers 1 \ --num-servers 1 \ --num-samplers 0 \ --part-config /tmp/ogbn_arxiv_nc_1p/ogbn-arxiv.json \ --ip-config /tmp/ogbn-arxiv-nc/ip_list.txt \ --ssh-port 2222 \ --cf /graphstorm/training_scripts/gsgnn_np/arxiv_nc.yaml \ --save-model-path /tmp/ogbn-arxiv-nc/models
This command uses GraphStorm’s training scripts and default settings defined in the /graphstorm/training_scripts/gsgnn_np/arxiv_nc.yaml file. It will train an RGCN model by 10 epochs and save the model files after each epoch at the /tmp/ogbn-arxiv-nc/models
folder whose contents are like the below structure.
/tmp/ogbn-arxiv-nc/models |- epoch-0 model.bin node_sparse_emb.bin optimizers.bin |- epoch-1 ... |- epoch-9
In a single AWS g4dn.12xlarge instance, it takes around 8 seconds to finish one training and evaluation epoch by using 1 GPU.
Launch inference#
The output log of the training command also show which epoch achieves the best performance on the validation set. Users can use the saved model in this best performance epoch, e.g., epoch-7, to do inference.
The inference command is:
python3 -m graphstorm.run.gs_node_classification \ --inference \ --workspace /tmp/ogbn-arxiv-nc \ --num-trainers 1 \ --num-servers 1 \ --num-samplers 0 \ --part-config /tmp/ogbn_arxiv_nc_1p/ogbn-arxiv.json \ --ip-config /tmp/ogbn-arxiv-nc/ip_list.txt \ --ssh-port 2222 \ --cf /graphstorm/training_scripts/gsgnn_np/arxiv_nc.yaml \ --save-prediction-path /tmp/ogbn-arxiv-nc/predictions/ \ --restore-model-path /tmp/ogbn-arxiv-nc/models/epoch-7/
This inference command predicts the classes of nodes in the testing set and saves the results, a Pytorch tensor file named “predict-0.pt”, into the /tmp/ogbn-arxiv-nc/predictions/
folder.
That is it! You have learnt how to use GraphStorm in three steps.
Next users can check the Use Your Own Graph Data tutorial to prepare your own graph data for using GraphStorm.
Clean Up#
Once finished with GML tasks, users can exit the GraphStorm Docker container with command exit
and then stop the container to restore computation resources.
Run this command in the container running environment to leave the GraphStorm container.
exit
Run this command in the instance environment to stop the GprahStorm Docker container.
docker stop test
Make sure you give the correct container name in the above command. Here it stops the container named test
.
Then users can use this command to check the status of all Docker containers. The container with the name test
should have a “STATUS” like “Exited (0) ** ago”.
docker ps -a
Get Started
- Environment Setup
- Standalone Mode Quick Start Tutorial
- Use Your Own Data Tutorial
- GraphStorm Configurations
Scale to Giant Graphs
Advanced Topics