Skip to content

Commit

Permalink
Merge branch 'main' into add_wg_sparse_emb_rebased
Browse files Browse the repository at this point in the history
  • Loading branch information
chang-l authored Jan 13, 2024
2 parents afa3d05 + aacf520 commit f8ab9df
Show file tree
Hide file tree
Showing 64 changed files with 3,000 additions and 100 deletions.
1 change: 1 addition & 0 deletions .github/workflow_scripts/e2e_check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ sh ./tests/end2end-tests/create_data.sh
sh ./tests/end2end-tests/tools/test_mem_est.sh
sh ./tests/end2end-tests/data_process/test.sh
sh ./tests/end2end-tests/data_process/movielens_test.sh
sh ./tests/end2end-tests/data_process/homogeneous_test.sh
sh ./tests/end2end-tests/custom-gnn/run_test.sh
bash ./tests/end2end-tests/graphstorm-nc/test.sh
bash ./tests/end2end-tests/graphstorm-lp/test.sh
Expand Down
1 change: 1 addition & 0 deletions .github/workflow_scripts/lint_check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ python3 -m pip install --upgrade prospector pip
yes | pip3 install astroid==v3.0.0
FORCE_CUDA=1 python3 -m pip install -e '.[test]' --no-build-isolation
pylint --rcfile=./tests/lint/pylintrc ./python/graphstorm/data/*.py
pylint --rcfile=./tests/lint/pylintrc ./python/graphstorm/distributed/
pylint --rcfile=./tests/lint/pylintrc ./python/graphstorm/dataloading/
pylint --rcfile=./tests/lint/pylintrc ./python/graphstorm/gconstruct/
pylint --rcfile=./tests/lint/pylintrc ./python/graphstorm/config/
Expand Down
2 changes: 1 addition & 1 deletion docker/sagemaker/Dockerfile.sm
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ ENV PYTHONPATH="/opt/ml/code/graphstorm/python/:${PYTHONPATH}"
RUN cp /opt/ml/code/graphstorm/sagemaker/run/* /opt/ml/code/

# Download DGL source code
RUN cd /root; git clone https://github.com/dmlc/dgl.git; cd dgl; git checkout -b 1.1.0 1.1.0
RUN cd /root; git clone https://github.com/dmlc/dgl.git
# Un-comment if we prefer a local DGL distribution
# COPY dgl /root/dgl
ENV PYTHONPATH="/root/dgl/tools/:${PYTHONPATH}"
Expand Down
2 changes: 2 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
sphinx==7.1.2
sphinx-rtd-theme==1.3.0
nbsphinx
pandoc
--extra-index-url https://download.pytorch.org/whl/cpu
torch==1.13.1+cpu
-f https://data.dgl.ai/wheels-internal/repo.html
Expand Down
2 changes: 1 addition & 1 deletion docs/source/advanced/own-models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ The GraphStorm trainers can have evaluators and task trackers associated. The fo
config.early_stop_strategy)
trainer.setup_evaluator(evaluator)
# Optional: set up a task tracker to show the progress of training.
tracker = GSSageMakerTaskTracker(config)
tracker = GSSageMakerTaskTracker(config.eval_frequency)
trainer.setup_task_tracker(tracker)
GraphStorm's `evaluators <https://github.com/awslabs/graphstorm/blob/main/python/graphstorm/eval/evaluator.py>`_ could help to compute the required evaluation metrics, such as ``accuracy``, ``f1``, ``mrr``, and etc. Users can select the proper evaluator and use the trainer's ``setup_evaluator()`` method to attach them. GraphStorm's `task trackers <https://github.com/awslabs/graphstorm/blob/main/python/graphstorm/tracker/graphstorm_tracker.py>`_ serve as log collectors, which are used to show the process information.
Expand Down
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
"sphinx.ext.autosummary",
"sphinx.ext.coverage",
"sphinx.ext.mathjax",
"nbsphinx",
]
templates_path = ['_templates']
exclude_patterns = []
Expand Down
15 changes: 5 additions & 10 deletions docs/source/configuration/configuration-run.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,6 @@ GraphStorm provides a set of parameters to control how and where to save and res
- Yaml: ``task_tracker: sagemaker_task_tracker``
- Argument: ``--task_tracker sagemaker_task_tracker``
- Default value: ``sagemaker_task_tracker``
- **log_report_frequency**: The frequency of reporting model performance metrics through task_tracker. The frequency is defined by using number of iterations, i.e., every N iterations the evaluation metrics will be reported. (Please note the evaluation metrics should be generated at the reporting iteration. See "eval_frequency" for how evaluation frequency is controlled.)

- Yaml: ``log_report_frequency: 1000``
- Argument: ``--log-report-frequency 1000``
- Default value: ``1000``
- **restore_model_path**: A path where GraphStorm model parameters were saved. For training, if restore_model_path is set, GraphStom will retrieve the model parameters from restore_model_path instead of initializing the parameters. For inference, restore_model_path must be provided.

- Yaml: ``restore_model_path: /model/checkpoint/``
Expand Down Expand Up @@ -278,7 +273,7 @@ GraphStorm provides a set of parameters to control model evaluation.
- Yaml: ``use_mini_batch_infer: false``
- Argument: ``--use-mini-batch-infer false``
- Default value: ``true``
- **eval_frequency**: The frequency of doing evaluation. GraphStorm trainers do evaluation at the end of each epoch. However, for large-scale graphs, training one epoch may take hundreds of thousands of iterations. One may want to do evaluations in the middle of an epoch. When eval_frequency is set, every **eval_frequency** iterations, the trainer will do evaluation once. The evaluation results can be printed and reported. See **log_report_frequency** for more details.
- **eval_frequency**: The frequency of doing evaluation. GraphStorm trainers do evaluation at the end of each epoch. However, for large-scale graphs, training one epoch may take hundreds of thousands of iterations. One may want to do evaluations in the middle of an epoch. When eval_frequency is set, every **eval_frequency** iterations, the trainer will do evaluation once. The evaluation results can be printed and reported.

- Yaml: ``eval_frequency: 10000``
- Argument: ``--eval-frequency 10000``
Expand Down Expand Up @@ -381,20 +376,20 @@ Classification and Regression Task

Node Classification/Regression Specific
.........................................
- **target_ntype**: (**Required**) The node type for prediction.
- **target_ntype**: The node type for prediction.

- Yaml: ``target_ntype: movie``
- Argument: ``--target-ntype movie``
- Default value: This parameter must be provided by user.
- Default value: For heterogeneous input graph, this parameter must be provided by the user. If not provided, GraphStorm will assume the input graph is a homogeneous graph and set ``target_ntype`` to "_N".

Edge Classification/Regression Specific
..........................................
- **target_etype**: (**Required**) The list of canonical edge types that will be added as a training target in edge classification/regression tasks, for example ``--train-etype query,clicks,asin`` or ``--train-etype query,clicks,asin query,search,asin``. A canonical edge type should be formatted as `src_node_type,relation_type,dst_node_type`. Currently, GraphStorm only supports single task edge classification/regression, i.e., it only accepts one canonical edge type.
- **target_etype**: The list of canonical edge types that will be added as training targets in edge classification/regression tasks, for example ``--train-etype query,clicks,asin`` or ``--train-etype query,clicks,asin query,search,asin``. A canonical edge type should be formatted as `src_node_type,relation_type,dst_node_type`. Currently, GraphStorm only supports single task edge classification/regression, i.e., it only accepts one canonical edge type.

- Yaml: ``target_etype:``
| ``- query,clicks,asin``
- Argument: ``--target-etype query,clicks,asin``
- Default value: This parameter must be provided by user.
- Default value: For heterogeneous input graph, this parameter must be provided by the user. If not provided, GraphStorm will assume the input graph is a homogeneous graph and set ``target_etype`` to ("_N", "_E", "_N").
- **remove_target_edge_type**: When set to true, GraphStorm removes target_etype in message passing, i.e., any edge with target_etype will not be sampled during training and inference.

- Yaml: ``remove_target_edge_type: false``
Expand Down
8 changes: 8 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,14 @@ Welcome to the GraphStorm Documentation and Tutorials
scale/distributed
scale/sagemaker

.. toctree::
:maxdepth: 1
:caption: Programming User Guide
:hidden:
:glob:

notebooks/Notebook_0_Data_Prepare

.. toctree::
:maxdepth: 1
:caption: Advanced Topics
Expand Down
Loading

0 comments on commit f8ab9df

Please sign in to comment.