Skip to content

Commit

Permalink
Run training jobs without ssh in the local machine. (#712)
Browse files Browse the repository at this point in the history
*Description of changes:*
When running a training job on the local machine, we can avoid using
ssh.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Ubuntu <[email protected]>
  • Loading branch information
zheng-da and Ubuntu authored Jan 24, 2024
1 parent 9bb4232 commit e1482e0
Show file tree
Hide file tree
Showing 5 changed files with 284 additions and 88 deletions.
7 changes: 6 additions & 1 deletion docs/source/install/env-setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,12 @@ For CPU environment:
Configure SSH No-password login
................................
Use the following commands to configure a local SSH no-password login that GraphStorm relies on.
To perform distributed training in a cluster of machines, please use the following commands
to configure a local SSH no-password login that GraphStorm relies on.

.. note::

This is not needed for the standalone mode.

.. code-block:: bash
Expand Down
8 changes: 1 addition & 7 deletions docs/source/tutorials/own-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,6 @@ It is easy for users to prepare their own graph data and leverage GraphStorm's b
* Step 2: Modify the GraphStorm configuration YAML file.
* Step 3: Launch GraphStorm commands for training/inference.

.. warning::

- All commands below are designed to run in a GraphStorm Docker container. Please refer to the :ref:`GraphStorm Docker environment setup<setup>` to prepare the Docker container environment.

- If you set up the :ref:`GraphStorm environment with pip Packages<setup_pip>`, please replace all occurrences of "2222" in the argument ``--ssh-port`` with **22**, and clone GraphStorm toolkits. If use this method to setup GraphStorm environment, you may need to replace the ``python3`` command with ``python``, depending on your Python versions.

Step 1: Prepare Your Own Graph Data
-------------------------------------
There are two options to prepare your own graph data for using GraphStorm:
Expand Down Expand Up @@ -467,4 +461,4 @@ Similar to the :ref:`Quick-Start <quick-start-standalone>` tutorial, users can l
--restore-model-path /tmp/acm_lp/models/epoch-0 \
--save-embed-path /tmp/acm_lp/embeds
Once users get familiar with the three steps of using your own graph data, the next step would be look through :ref:`GraphStorm's Configurations<configurations>` that control the three steps for your specific requirements.
Once users get familiar with the three steps of using your own graph data, the next step would be look through :ref:`GraphStorm's Configurations<configurations>` that control the three steps for your specific requirements.
22 changes: 1 addition & 21 deletions docs/source/tutorials/quick-start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,6 @@ GraphStorm is designed for easy-to-use GML models, particularly the graph neural

This tutorial will use GraphStorm's built-in OGB-arxiv dataset for a node classification task to demonstrate these three steps.

.. warning::

- All commands below are designed to run in a GraphStorm Docker container. Please refer to the :ref:`GraphStorm Docker environment setup<setup>` to prepare the Docker container environment.

- If you set up the :ref:`GraphStorm environment with pip Packages<setup_pip>`, please replace all occurrences of "2222" in the argument ``--ssh-port`` with **22**, and clone GraphStorm toolkits. And if use this method to setup GraphStorm environment, you may need to replace the ``python3`` command with ``python``, depending on your Python versions.

Download and Partition OGB-arxiv Data
--------------------------------------
First run the below command.
Expand Down Expand Up @@ -57,14 +51,8 @@ Running the following command can download the ogbn-arxiv graph data and split t

Launch Training
-----------------
GraphStorm currently relies on **ssh** to launch its scripts. Therefore before launch any scripts, users need to create an IP address file, which contains all private IP addresses in a cluster. If run GraphStorm in the Standalone mode, which run only in a **single machine**, as this tutorial does, users only need to run the following command to create an ``ip_list.txt`` file that has one row '**127.0.0.1**' as its content.

.. code-block:: bash
touch /tmp/ip_list.txt
echo 127.0.0.1 > /tmp/ip_list.txt

Then run the below command to start a training job that trains an built-in RGCN model to perform node classification on the OGB-arxiv.
Run the below command to start a training job that trains an built-in RGCN model to perform node classification on the OGB-arxiv.

.. code-block:: bash
Expand All @@ -74,8 +62,6 @@ Then run the below command to start a training job that trains an built-in RGCN
--num-servers 1 \
--num-samplers 0 \
--part-config /tmp/ogbn_arxiv_nc_1p/ogbn-arxiv.json \
--ip-config /tmp/ip_list.txt \
--ssh-port 2222 \
--cf /graphstorm/training_scripts/gsgnn_np/arxiv_nc.yaml \
--save-model-path /tmp/ogbn-arxiv-nc/models
Expand Down Expand Up @@ -134,8 +120,6 @@ The inference command is:
--num-servers 1 \
--num-samplers 0 \
--part-config /tmp/ogbn_arxiv_nc_1p/ogbn-arxiv.json \
--ip-config /tmp/ip_list.txt \
--ssh-port 2222 \
--cf /graphstorm/training_scripts/gsgnn_np/arxiv_nc.yaml \
--save-prediction-path /tmp/ogbn-arxiv-nc/predictions/ \
--restore-model-path /tmp/ogbn-arxiv-nc/models/epoch-7/
Expand All @@ -153,8 +137,6 @@ Inference on link prediction is similar as shown in the command below.
--num-servers 1 \
--num-samplers 0 \
--part-config /tmp/ogbn_arxiv_lp_1p/ogbn-arxiv.json \
--ip-config /tmp/ip_list.txt \
--ssh-port 2222 \
--cf /graphstorm/training_scripts/gsgnn_lp/arxiv_lp.yaml \
--save-embed-path /tmp/ogbn-arxiv-lp/predictions/ \
--restore-model-path /tmp/ogbn-arxiv-lp/models/epoch-2/
Expand All @@ -171,8 +153,6 @@ If users only need to generate node embeddings instead of doing predictions on t
--workspace /tmp/ogbn-arxiv-nc \
--num-trainers 1 \
--part-config /tmp/ogbn_arxiv_nc_1p/ogbn-arxiv.json \
--ip-config /tmp/ip_list.txt \
--ssh-port 2222 \
--cf /graphstorm/training_scripts/gsgnn_np/arxiv_nc.yaml \
--save-embed-path /tmp/ogbn-arxiv-nc/saved_embed \
--restore-model-path /tmp/ogbn-arxiv-nc/models/epoch-7/ \
Expand Down
Loading

0 comments on commit e1482e0

Please sign in to comment.