Skip to content

Commit

Permalink
[Enhancement] change the input argument of GSTaskTrackerAbc to be an …
Browse files Browse the repository at this point in the history
…integer (#699)

*Issue #, if available:*

*Description of changes:*
- This PR changes the input argument of `GSTaskTrackerAbc` from
`GSConfig` object into an integer because the `GSTaskTrackerAbc` only
needs an integer to set the `log_report_frequency` attribute.
- Using the `GSConfig` object will prevent users from using task tracker
to monitor running process because creating a GSConfig is NOT publicly
open, and is very complex.
- Decouple the `GSTaskTracker` from using `GSConfig` could help users to
construct task trackers and use them in the GraphStorm programming APIs.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: xiang song(charlie.song) <[email protected]>
  • Loading branch information
3 people authored Jan 11, 2024
1 parent b57fbe1 commit 84ab65b
Show file tree
Hide file tree
Showing 8 changed files with 16 additions and 19 deletions.
2 changes: 1 addition & 1 deletion docs/source/advanced/own-models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,7 @@ The GraphStorm trainers can have evaluators and task trackers associated. The fo
config.early_stop_strategy)
trainer.setup_evaluator(evaluator)
# Optional: set up a task tracker to show the progress of training.
tracker = GSSageMakerTaskTracker(config)
tracker = GSSageMakerTaskTracker(config.eval_frequency)
trainer.setup_task_tracker(tracker)
GraphStorm's `evaluators <https://github.com/awslabs/graphstorm/blob/main/python/graphstorm/eval/evaluator.py>`_ could help to compute the required evaluation metrics, such as ``accuracy``, ``f1``, ``mrr``, and etc. Users can select the proper evaluator and use the trainer's ``setup_evaluator()`` method to attach them. GraphStorm's `task trackers <https://github.com/awslabs/graphstorm/blob/main/python/graphstorm/tracker/graphstorm_tracker.py>`_ serve as log collectors, which are used to show the process information.
Expand Down
7 changes: 1 addition & 6 deletions docs/source/configuration/configuration-run.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,6 @@ GraphStorm provides a set of parameters to control how and where to save and res
- Yaml: ``task_tracker: sagemaker_task_tracker``
- Argument: ``--task_tracker sagemaker_task_tracker``
- Default value: ``sagemaker_task_tracker``
- **log_report_frequency**: The frequency of reporting model performance metrics through task_tracker. The frequency is defined by using number of iterations, i.e., every N iterations the evaluation metrics will be reported. (Please note the evaluation metrics should be generated at the reporting iteration. See "eval_frequency" for how evaluation frequency is controlled.)

- Yaml: ``log_report_frequency: 1000``
- Argument: ``--log-report-frequency 1000``
- Default value: ``1000``
- **restore_model_path**: A path where GraphStorm model parameters were saved. For training, if restore_model_path is set, GraphStom will retrieve the model parameters from restore_model_path instead of initializing the parameters. For inference, restore_model_path must be provided.

- Yaml: ``restore_model_path: /model/checkpoint/``
Expand Down Expand Up @@ -278,7 +273,7 @@ GraphStorm provides a set of parameters to control model evaluation.
- Yaml: ``use_mini_batch_infer: false``
- Argument: ``--use-mini-batch-infer false``
- Default value: ``true``
- **eval_frequency**: The frequency of doing evaluation. GraphStorm trainers do evaluation at the end of each epoch. However, for large-scale graphs, training one epoch may take hundreds of thousands of iterations. One may want to do evaluations in the middle of an epoch. When eval_frequency is set, every **eval_frequency** iterations, the trainer will do evaluation once. The evaluation results can be printed and reported. See **log_report_frequency** for more details.
- **eval_frequency**: The frequency of doing evaluation. GraphStorm trainers do evaluation at the end of each epoch. However, for large-scale graphs, training one epoch may take hundreds of thousands of iterations. One may want to do evaluations in the middle of an epoch. When eval_frequency is set, every **eval_frequency** iterations, the trainer will do evaluation once. The evaluation results can be printed and reported.

- Yaml: ``eval_frequency: 10000``
- Argument: ``--eval-frequency 10000``
Expand Down
2 changes: 1 addition & 1 deletion examples/customized_models/HGT/hgt_nc.py
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ def main(args):
config.early_stop_strategy)
trainer.setup_evaluator(evaluator)
# Optional: set up a task tracker to show the progress of training.
tracker = GSSageMakerTaskTracker(config)
tracker = GSSageMakerTaskTracker(config.eval_frequency)
trainer.setup_task_tracker(tracker)

# Start the training process.
Expand Down
2 changes: 1 addition & 1 deletion examples/peft_llm_gnn/main_nc.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def main(config_args):
config.early_stop_strategy,
)
trainer.setup_evaluator(evaluator)
tracker = GSSageMakerTaskTracker(config)
tracker = GSSageMakerTaskTracker(config.eval_frequency)
trainer.setup_task_tracker(tracker)

# create train loader
Expand Down
3 changes: 0 additions & 3 deletions examples/peft_llm_gnn/nc_config_Video_Games.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,8 @@ gsf:
batch_size: 4
dropout: 0.1
eval_batch_size: 4
# eval_frequency: 100
#log_report_frequency: 50
lr: 0.0001
num_epochs: 10
# save_model_frequency: 300
wd_l2norm: 1.0e-06
input:
restore_model_path: null
Expand Down
2 changes: 1 addition & 1 deletion python/graphstorm/gsf.py
Original file line number Diff line number Diff line change
Expand Up @@ -656,4 +656,4 @@ def check_homo(g):

def create_builtin_task_tracker(config):
tracker_class = get_task_tracker_class(config.task_tracker)
return tracker_class(config)
return tracker_class(config.eval_frequency)
10 changes: 6 additions & 4 deletions python/graphstorm/tracker/graphstorm_tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@ class GSTaskTrackerAbc():
Parameters
----------
config: GSConfig
Configurations. Users can add their own configures in the yaml config file.
log_report_frequency: int
The frequency of reporting model performance metrics through task_tracker.
The frequency is defined by using number of iterations, i.e., every N iterations
the evaluation metrics will be reported.
"""
def __init__(self, config):
self._report_frequency = config.log_report_frequency # Can be None if not provided
def __init__(self, log_report_frequency):
self._report_frequency = log_report_frequency # Can be None if not provided

@abc.abstractmethod
def log_metric(self, metric_name, metric_value, step, force_report=False):
Expand Down
7 changes: 5 additions & 2 deletions python/graphstorm/tracker/sagemaker_tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,11 @@ class GSSageMakerTaskTracker(GSTaskTrackerAbc):
Parameters
----------
config: GSConfig
Configurations. Users can add their own configures in the yaml config file.
log_report_frequency: int
The frequency of reporting model performance metrics through task_tracker.
The frequency is defined by using number of iterations, i.e., every N iterations
the evaluation metrics will be reported.
"""

def _do_report(self, step):
Expand Down

0 comments on commit 84ab65b

Please sign in to comment.