-
Notifications
You must be signed in to change notification settings - Fork 558
User Manual
Siran Yang edited this page Jul 9, 2019
·
2 revisions
In this section, we introduce the usage of the Euler graph learning algorithm package tf_eulr
. After users prepare the graph data for Euler, the command line can be used to train and evaluate a model and save the embeddings.
python -m tf_euler --data_dir <data_dir> --mode <mode> [flags]
-
--data_dir
, graph data directory, required. -
--mode
, operation model: train / evaluate / save_embedding, the default is train.
-
--train_node_type
, the node type in train data set, the default is 0. -
--all_node_type
, the node type in the whole data set, the default is -1. -
--train_edge_type
, the edge type in train data set, the default is [0]. -
--all_edge_type
, the edge type in the whole data set, the default is [0, 1]. -
--max_id
, the largest node id in the graph, required. -
--feature_idx
, the id of dense feature, required if using dense feature. -
--feature_dim
, the dimension of dense feature, required if using dense feature. -
-—label_idx
, the id of label in dense feature, required for supervised model. -
--label_dim
, the dimension of label in dense feature, required for supervised model. -
--num_classes
, class number, required if label is scalar. -
--id_file
, the id file of test data, one line one id, required for evaluation.
-
--model_dir
, checkpoint path, the default is ckpt. -
--batch_size
, batch size, the default is 512. -
--optimizer
, optimizer, the default is adam. -
--learning_rate
, learning rate, the default is 0.01. -
--num_epochs
, training epochs, the default is 10. -
--log_steps
, the interval steps to print log, the default is 20.
-
--model
, model name, includes line / randomwalk / graphsage / graphsage_supervised / scalable_gcn / gat / saved_embedding. -
--dim
, embedding dimension, the default is 128. -
--sigmoid_loss
/--nosigmoid_loss
, loss function, the default is--sigmoid_loss
;
unsupervised model
unsupervised model
unsupervised model / supervised model / supervised model
-
--fanouts
, the expansion number per layer, the default is [10, 10]. -
--aggregator
, aggregator type, includes gcn / mean / meanpool / maxpool, the default is mean. -
--concat
/--noconcat
, aggregation method, refer to GraphSAGE, the default is--concat
.
supervised model
-
--head_num
, attention head number, the default is 1.
supervised model
Import the embedding.npy
file in modir_dir as a dense feature and build the LR model to evaluate the effects of unsupervised models.
tf_euler
uses ParamerServer for distributed training, refer to Distributed TensorFlow. The graph engine will automatically split and share data between workers. Note that the data must be placed on the HDFS in distributed training.
-
--euler_zk_addr
,ZooKeeper service address,required for distributed training. -
--euler_zk_path
,ZooKeeper znode path,required for distributed training. -
--worker_hosts
,worker list, required for distributed training. -
--ps_hosts
,ps list, required for distributed training. -
--task_name
, task name, ps or worker, required for distributed training. -
--task_index
, task index, required for distributed training.