From 63a41fe6cb8edec1d3b9933baf4a6b2b880d06dc Mon Sep 17 00:00:00 2001
From: Xiang Song <xiangsx@amazon.com>
Date: Fri, 9 Aug 2024 12:40:13 -0700
Subject: [PATCH 1/9] Update

---
 docs/source/advanced/multi-task-learning.rst | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
diff --git a/docs/source/advanced/multi-task-learning.rst b/docs/source/advanced/multi-task-learning.rst
index 214b1c22de..6a3b538fec 100644
--- a/docs/source/advanced/multi-task-learning.rst
+++ b/docs/source/advanced/multi-task-learning.rst
@@ -318,3 +318,18 @@ GraphStorm supports to run multi-task inference on :ref:`SageMaker<distributed-s
         --instance-count <INSTANCE_COUNT> \
         --instance-type <INSTANCE_TYPE>
 
+Multi-task Learning Output
+--------------------------
+
+Saved Node Embeddings
+~~~~~~~~~~~~~~~~~~~~~~
+When ``save_embed_path`` is provided in the training config or inference condig,
+GraphStorm will save the node embeddings in the corresponding path.
+In multi-task learning, xxx
+
+
+Saved Prediction Results
+~~~~~~~~~~~~~~~~~~~~~~~~~
+When ``save_prediction_path`` is provided in the inference condig,
+GraphStorm will save the prediction results in the corresponding path.
+In multi-task learning, xxx

From 9ba619e2762d8ae2a20c04e5f94458b2118776bd Mon Sep 17 00:00:00 2001
From: Xiang Song <xiangsx@amazon.com>
Date: Mon, 12 Aug 2024 18:12:51 -0700
Subject: [PATCH 2/9] Update

---
 .../cli/model-training-inference/index.rst    |   1 +
 .../cli/model-training-inference/output.rst   | 160 ++++++++++++++++++
 2 files changed, 161 insertions(+)
 create mode 100644 docs/source/cli/model-training-inference/output.rst

diff --git a/docs/source/cli/model-training-inference/index.rst b/docs/source/cli/model-training-inference/index.rst
index 15eac260ee..ff91b6d093 100644
--- a/docs/source/cli/model-training-inference/index.rst
+++ b/docs/source/cli/model-training-inference/index.rst
@@ -17,4 +17,5 @@ In addition, there are two node ID mapping operations during the graph construct
    single-machine-training-inference
    distributed/cluster
    distributed/sagemaker
+   output
    output-remapping
diff --git a/docs/source/cli/model-training-inference/output.rst b/docs/source/cli/model-training-inference/output.rst
new file mode 100644
index 0000000000..4f4d763959
--- /dev/null
+++ b/docs/source/cli/model-training-inference/output.rst
@@ -0,0 +1,160 @@
+.. _gs-output:
+
+GraphStorm Output
+=================
+
+
+.. _gs-output-embs:
+
+Saved Node Embeddings
+---------------------
+When ``save_embed_path`` is provided in the training config or inference condig,
+GraphStorm will save the node embeddings in the corresponding path. The node embeddings
+of each node type are saved separately under different sub-directories named with
+the corresponding node types. GraphStorm will also save an ``emb_info.json`` file,
+which contains all the metadata for the saved node embeddings. The ``save_embed_path``
+will look like following:
+
+.. code-block:: bash
+
+    emb_dir/
+        ntype0/
+            embed_nids-00000.pt
+            embed_nids-00001.pt
+            ...
+            embed-00000.pt
+            embed-00001.pt
+            ...
+        ntype1/
+            embed_nids-00000.pt
+            embed_nids-00001.pt
+            ...
+            embed-00000.pt
+            embed-00001.pt
+            ...
+        ...
+        emb_info.json
+
+The ``embed_nids-*`` files store the integer node IDs of each node embedding and
+the ``embed-*`` files store the corresponding node embeddings.
+The content of ``embed_nids-*`` files and ``embed-*`` files looks like:
+
+.. code-block::
+
+    embed_nids-00000.pt  |   embed-00000.pt
+                         |
+    Graph Node ID        |   embeddings
+    10                   |   0.112,0.123,-0.011,...
+    1                    |   0.872,0.321,-0.901,...
+    23                   |   0.472,0.432,-0.732,...
+    ...
+
+The ``emb_info.json`` stores three informations:
+  * ``format``: The format of the saved embeddings. By default, it is ``pytorch``.
+  * ``emb_name``: A list of node types that have node embeddings saved. For example: ["ntype0", "ntype1"]
+  * ``world_size``: The number of chunks (files) into which the node embeddings of a particular node type are divided. For instance, if world_size is set to 8, there will be 8 files for each set of node embeddings."
+
+.. _gs-output-predictions:
+
+Saved Prediction Results
+------------------------
+When ``save_prediction_path`` is provided in the inference condig,
+GraphStorm will save the prediction results in the corresponding path.
+For node prediction tasks, the prediction results are saved per node type.
+GraphStorm will also save an ``result_info.json`` file, which contains all
+the metadata for the saved prediction results. The ``save_prediction_path``
+will look like following:
+
+.. code-block:: bash
+
+    prediction_dir/
+        ntype0/
+            predict-00000.pt
+            predict-00001.pt
+            ...
+            predict_nids-00000.pt
+            predict_nids-00001.pt
+            ...
+        ntype1/
+            predict-00000.pt
+            predict-00001.pt
+            ...
+            predict_nids-00000.pt
+            predict_nids-00001.pt
+            ...
+        ...
+        result_info.json
+
+The ``predict_nids-*`` files store the integer node IDs of each prediction result and
+the ``predict-*`` files store the corresponding prediction results.
+The content of ``predict_nids-*`` files and ``predict-*`` files looks like:
+
+.. code-block::
+
+    predict_nids-00000.pt  |   predict.pt
+                           |
+    Graph Node ID          |   Prediction results
+    10                     |   0.112
+    1                      |   0.872
+    23                     |   0.472
+    ...
+
+The ``result_info.json`` stores three informations:
+  * ``format``: The format of the saved prediction results. By default, it is ``pytorch``.
+  * ``emb_name``: A list of node types that have node prediction results saved. For example: ["ntype0", "ntype1"]
+  * ``world_size``: The number of chunks (files) into which the prediction results of a particular node type are divided. For instance, if world_size is set to 8, there will be 8 files for each set of prediction results."
+
+
+For edge prediction tasks, the prediction results are saved per edge type.
+The sub-directory for an edge type is named as ``<src_ntype>_<relation_type>_<dst_ntype>``.
+For instance, given an edge type ``("movie","rated-by","user")``, the corresponding
+sub-directory is named as ``movie_rated-by_user``.
+GraphStorm will also save an ``result_info.json`` file, which contains all
+the metadata for the saved prediction results. The ``save_prediction_path``
+will look like following:
+
+.. code-block:: bash
+
+    prediction_dir/
+        etype0/
+            predict-00000.pt
+            predict-00001.pt
+            ...
+            src_nids-00000.pt
+            src_nids-00001.pt
+            ...
+            dst_nids-00000.pt
+            dst_nids-00001.pt
+            ...
+        etype1/
+            predict-00000.pt
+            predict-00001.pt
+            ...
+            src_nids-00000.pt
+            src_nids-00001.pt
+            ...
+            dst_nids-00000.pt
+            dst_nids-00001.pt
+            ...
+        ...
+        result_info.json
+
+The ``src_nids-*`` and ``dst_nids-*`` files contain the integer node IDs for
+the source and destination nodes of each prediction, respectively.
+The ``predict-*`` files store the corresponding prediction results.
+The content of ``src_nids-*``, ``dst_nids-*`` and ``predict-*`` files looks like:
+
+.. code-block::
+
+    src_nids-00000.pt   |   dst_nids-00000.pt   |   predict.pt
+                        |
+    Source Node ID      |   Destination Node ID |   Prediction results
+    10                  |   12                  |   0.112
+    1                   |   20                  |   0.872
+    23                  |   3                   |   0.472
+    ...
+
+The ``result_info.json`` stores three informations:
+  * ``format``: The format of the saved prediction results. By default, it is ``pytorch``.
+  * ``etypes``: A list of edge types that have edge prediction results saved. For example: [("movie","rated-by","user"), ("user","watched","movie")]
+  * ``world_size``: The number of chunks (files) into which the prediction results of a particular edge type are divided. For instance, if world_size is set to 8, there will be 8 files for each set of prediction results."
\ No newline at end of file

From 1a8d97fcc26f6eed3448ac72dc3eace4628b14e0 Mon Sep 17 00:00:00 2001
From: Xiang Song <xiangsx@amazon.com>
Date: Mon, 12 Aug 2024 22:52:42 -0700
Subject: [PATCH 3/9] Update

---
 docs/source/advanced/multi-task-learning.rst  | 115 +++++++++++++++++-
 .../cli/model-training-inference/output.rst   |  13 +-
 2 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/docs/source/advanced/multi-task-learning.rst b/docs/source/advanced/multi-task-learning.rst
index 6a3b538fec..1f4a9f2823 100644
--- a/docs/source/advanced/multi-task-learning.rst
+++ b/docs/source/advanced/multi-task-learning.rst
@@ -325,11 +325,122 @@ Saved Node Embeddings
 ~~~~~~~~~~~~~~~~~~~~~~
 When ``save_embed_path`` is provided in the training config or inference condig,
 GraphStorm will save the node embeddings in the corresponding path.
-In multi-task learning, xxx
+In multi-task learning, by default, GraphStorm will save the node embeddings
+produced by the GNN layer for every node type under the path specified by
+``save_embed_path``。 The output format follows the :ref:`GraphStorm saved node embeddings
+format<gs-out-embs>`. Meanwhile, in multi-task learning, certain tasks might apply
+task specific normalization to node embeddings. For instance, a link prediction
+task might apply l2 normalization on each node embeddings. In certain cases, GraphStorm
+will also save the normalized node embeddings under ``save_embed_path``.
+The task specific node embeddings are saved separately under different sub-directories
+named with the corresponding task id. (A task id is formated as ``<task_type>-<ntype/etype(s)>-<label>``.
+For instance, the task id of a node classification task on the node type ``paper`` with the
+label filed ``venue`` will be ``node_classification-paper-venue``. As another example,
+the task id of a link prediction task on the edge type ``(paper, cite, paper)`` will be
+``link_prediction-paper_cite_paper``
+and the task id of a edge regression task on the edge type ``(paper, cite, paper)`` with
+the label field ``year`` will be ``edge_regression-paper_cite_paper-year``).
+The output format of task specific node embeddings follows
+the :ref:`GraphStorm saved node embeddings format<gs-out-embs>`.
+The ``save_embed_path`` in multi-task learning will look like following:
 
+.. code-block:: bash
+
+    emb_dir/
+        ntype0/
+            embed_nids-00000.pt
+            embed_nids-00001.pt
+            ...
+            embed-00000.pt
+            embed-00001.pt
+            ...
+        ntype1/
+            embed_nids-00000.pt
+            embed_nids-00001.pt
+            ...
+            embed-00000.pt
+            embed-00001.pt
+            ...
+        emb_info.json
+        link_prediction-paper_cite_paper/
+            ntype0/
+                embed_nids-00000.pt
+                embed_nids-00001.pt
+                ...
+                embed-00000.pt
+                embed-00001.pt
+                ...
+            ntype1/
+                embed_nids-00000.pt
+                embed_nids-00001.pt
+                ...
+                embed-00000.pt
+                embed-00001.pt
+                ...
+            emb_info.json
+        edge_regression-paper_cite_paper-year/
+            ntype0/
+                embed_nids-00000.pt
+                embed_nids-00001.pt
+                ...
+                embed-00000.pt
+                embed-00001.pt
+                ...
+            ntype1/
+                embed_nids-00000.pt
+                embed_nids-00001.pt
+                ...
+                embed-00000.pt
+                embed-00001.pt
+                ...
+            emb_info.json
+
+In the above example both the link prediction task and the edge regression
+apply task specific normalization on node embeddings.
+
+**Note: The built-in GraphStorm training or inference pipeline
+(launched by GraphStorm CLI) will process each saved node embeddings
+to convert the integer node ids into the raw node ids, which are usually string node ids..**
+Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
 
 Saved Prediction Results
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 When ``save_prediction_path`` is provided in the inference condig,
 GraphStorm will save the prediction results in the corresponding path.
-In multi-task learning, xxx
+In multi-task learning inference, each prediction task will have its prediction
+results saved separately under different sub-directories
+named with the
+corresponding task id. The output format of task specific prediction results
+follows the :ref:`GraphStorm saved prediction result format<gs-out-predictions>`.
+The ``save_prediction_path`` in multi-task learning will look like following:
+
+.. code-block:: bash
+
+    prediction_dir/
+        edge_regression-paper_cite_paper-year/
+            paper_cite_paper/
+                predict-00000.pt
+                predict-00001.pt
+                ...
+                src_nids-00000.pt
+                src_nids-00001.pt
+                ...
+                dst_nids-00000.pt
+                dst_nids-00001.pt
+                ...
+            result_info.json
+        node_classification-paper-venue/
+            paper/
+                predict-00000.pt
+                predict-00001.pt
+                ...
+                predict_nids-00000.pt
+                predict_nids-00001.pt
+                ...
+            result_info.json
+        ...
+
+**Note: The built-in GraphStorm inference pipeline
+(launched by GraphStorm CLI) will process each saved prediction result
+to convert the integer node ids into the raw node ids, which are usually string node ids.**
+Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
diff --git a/docs/source/cli/model-training-inference/output.rst b/docs/source/cli/model-training-inference/output.rst
index 4f4d763959..42041f0855 100644
--- a/docs/source/cli/model-training-inference/output.rst
+++ b/docs/source/cli/model-training-inference/output.rst
@@ -3,7 +3,6 @@
 GraphStorm Output
 =================
 
-
 .. _gs-output-embs:
 
 Saved Node Embeddings
@@ -54,6 +53,11 @@ The ``emb_info.json`` stores three informations:
   * ``emb_name``: A list of node types that have node embeddings saved. For example: ["ntype0", "ntype1"]
   * ``world_size``: The number of chunks (files) into which the node embeddings of a particular node type are divided. For instance, if world_size is set to 8, there will be 8 files for each set of node embeddings."
 
+**Note: The built-in GraphStorm training or inference pipeline
+(launched by GraphStorm CLI) will process the saved node embeddings
+to convert the integer node ids into the raw node ids, which are usually string node ids..**
+Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+
 .. _gs-output-predictions:
 
 Saved Prediction Results
@@ -157,4 +161,9 @@ The content of ``src_nids-*``, ``dst_nids-*`` and ``predict-*`` files looks like
 The ``result_info.json`` stores three informations:
   * ``format``: The format of the saved prediction results. By default, it is ``pytorch``.
   * ``etypes``: A list of edge types that have edge prediction results saved. For example: [("movie","rated-by","user"), ("user","watched","movie")]
-  * ``world_size``: The number of chunks (files) into which the prediction results of a particular edge type are divided. For instance, if world_size is set to 8, there will be 8 files for each set of prediction results."
\ No newline at end of file
+  * ``world_size``: The number of chunks (files) into which the prediction results of a particular edge type are divided. For instance, if world_size is set to 8, there will be 8 files for each set of prediction results."
+
+**Note: The built-in GraphStorm inference pipeline
+(launched by GraphStorm CLI) will process the saved prediction results
+to convert the integer node ids into the raw node ids, which are usually string node ids.**
+Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`

From 8d6a8d3ca6d995f0815f5a882114777212d99efb Mon Sep 17 00:00:00 2001
From: "xiang song(charlie.song)" <classicxsong@gmail.com>
Date: Tue, 13 Aug 2024 21:11:35 -0700
Subject: [PATCH 4/9] Update docs/source/advanced/multi-task-learning.rst

Co-authored-by: Jian Zhang (James) <6593865@qq.com>
---
 docs/source/advanced/multi-task-learning.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/advanced/multi-task-learning.rst b/docs/source/advanced/multi-task-learning.rst
index 1f4a9f2823..e7eb16d09a 100644
--- a/docs/source/advanced/multi-task-learning.rst
+++ b/docs/source/advanced/multi-task-learning.rst
@@ -323,7 +323,7 @@ Multi-task Learning Output
 
 Saved Node Embeddings
 ~~~~~~~~~~~~~~~~~~~~~~
-When ``save_embed_path`` is provided in the training config or inference condig,
+When ``save_embed_path`` is provided in the training configuration or the inference configuration,
 GraphStorm will save the node embeddings in the corresponding path.
 In multi-task learning, by default, GraphStorm will save the node embeddings
 produced by the GNN layer for every node type under the path specified by

From 6012b4f7bedb808d31305c81d17639ec042ff184 Mon Sep 17 00:00:00 2001
From: "xiang song(charlie.song)" <classicxsong@gmail.com>
Date: Tue, 13 Aug 2024 21:11:48 -0700
Subject: [PATCH 5/9] Update docs/source/advanced/multi-task-learning.rst

Co-authored-by: Jian Zhang (James) <6593865@qq.com>
---
 docs/source/advanced/multi-task-learning.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/advanced/multi-task-learning.rst b/docs/source/advanced/multi-task-learning.rst
index e7eb16d09a..0c9d519606 100644
--- a/docs/source/advanced/multi-task-learning.rst
+++ b/docs/source/advanced/multi-task-learning.rst
@@ -327,7 +327,7 @@ When ``save_embed_path`` is provided in the training configuration or the infere
 GraphStorm will save the node embeddings in the corresponding path.
 In multi-task learning, by default, GraphStorm will save the node embeddings
 produced by the GNN layer for every node type under the path specified by
-``save_embed_path``。 The output format follows the :ref:`GraphStorm saved node embeddings
+``save_embed_path``. The output format follows the :ref:`GraphStorm saved node embeddings
 format<gs-out-embs>`. Meanwhile, in multi-task learning, certain tasks might apply
 task specific normalization to node embeddings. For instance, a link prediction
 task might apply l2 normalization on each node embeddings. In certain cases, GraphStorm

From 6bbb77f33348e03ad6a4b959ab4b27093b3c379b Mon Sep 17 00:00:00 2001
From: "xiang song(charlie.song)" <classicxsong@gmail.com>
Date: Tue, 13 Aug 2024 21:11:56 -0700
Subject: [PATCH 6/9] Update docs/source/advanced/multi-task-learning.rst

Co-authored-by: Jian Zhang (James) <6593865@qq.com>
---
 docs/source/advanced/multi-task-learning.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/advanced/multi-task-learning.rst b/docs/source/advanced/multi-task-learning.rst
index 0c9d519606..6c48b552f3 100644
--- a/docs/source/advanced/multi-task-learning.rst
+++ b/docs/source/advanced/multi-task-learning.rst
@@ -335,7 +335,7 @@ will also save the normalized node embeddings under ``save_embed_path``.
 The task specific node embeddings are saved separately under different sub-directories
 named with the corresponding task id. (A task id is formated as ``<task_type>-<ntype/etype(s)>-<label>``.
 For instance, the task id of a node classification task on the node type ``paper`` with the
-label filed ``venue`` will be ``node_classification-paper-venue``. As another example,
+label field ``venue`` will be ``node_classification-paper-venue``. As another example,
 the task id of a link prediction task on the edge type ``(paper, cite, paper)`` will be
 ``link_prediction-paper_cite_paper``
 and the task id of a edge regression task on the edge type ``(paper, cite, paper)`` with

From 957d1c7cf101951a3e02fb156a492b1765672078 Mon Sep 17 00:00:00 2001
From: "xiang song(charlie.song)" <classicxsong@gmail.com>
Date: Tue, 13 Aug 2024 21:12:21 -0700
Subject: [PATCH 7/9] Update docs/source/advanced/multi-task-learning.rst

Co-authored-by: Jian Zhang (James) <6593865@qq.com>
---
 docs/source/advanced/multi-task-learning.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/advanced/multi-task-learning.rst b/docs/source/advanced/multi-task-learning.rst
index 6c48b552f3..abdc47e1db 100644
--- a/docs/source/advanced/multi-task-learning.rst
+++ b/docs/source/advanced/multi-task-learning.rst
@@ -342,7 +342,7 @@ and the task id of a edge regression task on the edge type ``(paper, cite, paper
 the label field ``year`` will be ``edge_regression-paper_cite_paper-year``).
 The output format of task specific node embeddings follows
 the :ref:`GraphStorm saved node embeddings format<gs-out-embs>`.
-The ``save_embed_path`` in multi-task learning will look like following:
+The contents of  the ``save_embed_path`` in multi-task learning will look like following:
 
 .. code-block:: bash
 

From b7771e65c8989c26aa8ec61c59f6a3f61c359ec2 Mon Sep 17 00:00:00 2001
From: Xiang Song <xiangsx@amazon.com>
Date: Tue, 13 Aug 2024 22:16:57 -0700
Subject: [PATCH 8/9] update

---
 docs/source/advanced/multi-task-learning.rst  | 14 ++--
 .../cli/model-training-inference/output.rst   | 79 +++++++++++++++----
 2 files changed, 69 insertions(+), 24 deletions(-)

diff --git a/docs/source/advanced/multi-task-learning.rst b/docs/source/advanced/multi-task-learning.rst
index abdc47e1db..d92fb4f12e 100644
--- a/docs/source/advanced/multi-task-learning.rst
+++ b/docs/source/advanced/multi-task-learning.rst
@@ -395,24 +395,24 @@ The contents of  the ``save_embed_path`` in multi-task learning will look like f
                 ...
             emb_info.json
 
-In the above example both the link prediction task and the edge regression
+In the above example both the link prediction and edge regression tasks
 apply task specific normalization on node embeddings.
 
 **Note: The built-in GraphStorm training or inference pipeline
-(launched by GraphStorm CLI) will process each saved node embeddings
-to convert the integer node ids into the raw node ids, which are usually string node ids..**
+(launched by GraphStorm CLIs) will process each saved node embeddings
+to convert the integer node IDs into the raw node IDs, which are usually string node IDs..**
 Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
 
 Saved Prediction Results
 ~~~~~~~~~~~~~~~~~~~~~~~~~
-When ``save_prediction_path`` is provided in the inference condig,
+When ``save_prediction_path`` is provided in the inference configuration,
 GraphStorm will save the prediction results in the corresponding path.
 In multi-task learning inference, each prediction task will have its prediction
 results saved separately under different sub-directories
 named with the
 corresponding task id. The output format of task specific prediction results
 follows the :ref:`GraphStorm saved prediction result format<gs-out-predictions>`.
-The ``save_prediction_path`` in multi-task learning will look like following:
+The contents of the ``save_prediction_path`` in multi-task learning will look like following:
 
 .. code-block:: bash
 
@@ -441,6 +441,6 @@ The ``save_prediction_path`` in multi-task learning will look like following:
         ...
 
 **Note: The built-in GraphStorm inference pipeline
-(launched by GraphStorm CLI) will process each saved prediction result
-to convert the integer node ids into the raw node ids, which are usually string node ids.**
+(launched by GraphStorm CLIs) will process each saved prediction result
+to convert the integer node IDs into the raw node IDs, which are usually string node IDs.**
 Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
diff --git a/docs/source/cli/model-training-inference/output.rst b/docs/source/cli/model-training-inference/output.rst
index 42041f0855..027f5c4c95 100644
--- a/docs/source/cli/model-training-inference/output.rst
+++ b/docs/source/cli/model-training-inference/output.rst
@@ -2,17 +2,57 @@
 
 GraphStorm Output
 =================
+GraphStorm training pipeline can save both trained model checkpoints and node embeddings
+on disk. When ``save_model_path`` is provided in the training configuration,
+the trained model checkpoints will be saved in the corresponding path.
+The contents of the ``save_model_path`` will look like following:
+
+.. code-block:: bash
+
+    model_dir/
+        epoch-0-iter-1099/
+        epoch-0-iter-2099/
+        epoch-0/
+        epoch-1-iter-1099/
+        epoch-1-iter-2099/
+        ...
+
+When ``save_embed_path`` is provided in the training configuration,
+the node embeddings produced by the bset model checkpoint will be saved
+in the corresponding path. When the training task is launched by
+GraphStorm CLIs, a ndoe ID remapping process will be launched
+automatically, after the training job, to process the saved node embeddings and the corresponding node IDs. The final output of node
+embeddings will be in parquet format by default. Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+
+GraphStorm inference pipeline can save both node embeddings and prediction
+results on disk. When ``save_embed_path`` is provided in the inference configuration,
+the node embeddings will be saved in the same way as GraphStorm training pipeline.
+When ``save_prediction_path`` is provided in the inference configurations,
+GraphStorm will save the prediction results in the corresponding path.
+When the inference task is launched by GraphStorm CLIs, a ndoe ID remapping
+process will be launched automatically, after the inference job, to
+process the saved prediction results and the corresponding node IDs.
+The final output of prediction results will be in parquet format by default.
+Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+
+
+The following sections will introduce how the node embeddings and prediction
+results are saved by the GraphStorm training and inference scripts.
+In most of the end-to-end training and inference cases, the saved
+files, usually in ``.pt`` format, are not consumable by the downstream
+applications. The :ref:`GraphStorm Output Node ID Remapping<output-remapping>` must be invoked to process the output files.
+
 
 .. _gs-output-embs:
 
 Saved Node Embeddings
 ---------------------
-When ``save_embed_path`` is provided in the training config or inference condig,
+When ``save_embed_path`` is provided in the training configuration or the inference configuration,
 GraphStorm will save the node embeddings in the corresponding path. The node embeddings
 of each node type are saved separately under different sub-directories named with
 the corresponding node types. GraphStorm will also save an ``emb_info.json`` file,
-which contains all the metadata for the saved node embeddings. The ``save_embed_path``
-will look like following:
+which contains all the metadata for the saved node embeddings.
+The contents of the ``save_embed_path`` will look like following:
 
 .. code-block:: bash
 
@@ -36,7 +76,7 @@ will look like following:
 
 The ``embed_nids-*`` files store the integer node IDs of each node embedding and
 the ``embed-*`` files store the corresponding node embeddings.
-The content of ``embed_nids-*`` files and ``embed-*`` files looks like:
+The contents of ``embed_nids-*`` files and ``embed-*`` files look like:
 
 .. code-block::
 
@@ -51,22 +91,25 @@ The content of ``embed_nids-*`` files and ``embed-*`` files looks like:
 The ``emb_info.json`` stores three informations:
   * ``format``: The format of the saved embeddings. By default, it is ``pytorch``.
   * ``emb_name``: A list of node types that have node embeddings saved. For example: ["ntype0", "ntype1"]
-  * ``world_size``: The number of chunks (files) into which the node embeddings of a particular node type are divided. For instance, if world_size is set to 8, there will be 8 files for each set of node embeddings."
+  * ``world_size``: The number of chunks (files) into which the node embeddings of a particular node type are divided. For instance, if world_size is set to 8, there will be 8 files for each node type's node embeddings."
 
 **Note: The built-in GraphStorm training or inference pipeline
-(launched by GraphStorm CLI) will process the saved node embeddings
-to convert the integer node ids into the raw node ids, which are usually string node ids..**
-Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+(launched by GraphStorm CLIs) will process the saved node embeddings
+to convert the integer node IDs into the raw node IDs, which are usually
+string node IDs. The final output will be in parquet format by default.
+And the node embedding files, i.e.,``embed-*.pt`` files, and node ID
+files, i.e.,``embed_nids-*.pt`` files, will be removed.** Details can be
+found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
 
-.. _gs-output-predictions:
+.. _gs-out-predictions:
 
 Saved Prediction Results
 ------------------------
-When ``save_prediction_path`` is provided in the inference condig,
+When ``save_prediction_path`` is provided in the inference configurations,
 GraphStorm will save the prediction results in the corresponding path.
 For node prediction tasks, the prediction results are saved per node type.
 GraphStorm will also save an ``result_info.json`` file, which contains all
-the metadata for the saved prediction results. The ``save_prediction_path``
+the metadata for the saved prediction results. The contents of the ``save_prediction_path``
 will look like following:
 
 .. code-block:: bash
@@ -106,7 +149,7 @@ The content of ``predict_nids-*`` files and ``predict-*`` files looks like:
 The ``result_info.json`` stores three informations:
   * ``format``: The format of the saved prediction results. By default, it is ``pytorch``.
   * ``emb_name``: A list of node types that have node prediction results saved. For example: ["ntype0", "ntype1"]
-  * ``world_size``: The number of chunks (files) into which the prediction results of a particular node type are divided. For instance, if world_size is set to 8, there will be 8 files for each set of prediction results."
+  * ``world_size``: The number of chunks (files) into which the prediction results of a particular node type are divided. For instance, if world_size is set to 8, there will be 8 files for each node type's prediction results."
 
 
 For edge prediction tasks, the prediction results are saved per edge type.
@@ -114,7 +157,7 @@ The sub-directory for an edge type is named as ``<src_ntype>_<relation_type>_<ds
 For instance, given an edge type ``("movie","rated-by","user")``, the corresponding
 sub-directory is named as ``movie_rated-by_user``.
 GraphStorm will also save an ``result_info.json`` file, which contains all
-the metadata for the saved prediction results. The ``save_prediction_path``
+the metadata for the saved prediction results. The contents of the ``save_prediction_path``
 will look like following:
 
 .. code-block:: bash
@@ -161,9 +204,11 @@ The content of ``src_nids-*``, ``dst_nids-*`` and ``predict-*`` files looks like
 The ``result_info.json`` stores three informations:
   * ``format``: The format of the saved prediction results. By default, it is ``pytorch``.
   * ``etypes``: A list of edge types that have edge prediction results saved. For example: [("movie","rated-by","user"), ("user","watched","movie")]
-  * ``world_size``: The number of chunks (files) into which the prediction results of a particular edge type are divided. For instance, if world_size is set to 8, there will be 8 files for each set of prediction results."
+  * ``world_size``: The number of chunks (files) into which the prediction results of a particular edge type are divided. For instance, if world_size is set to 8, there will be 8 files for each edge type's prediction results."
 
 **Note: The built-in GraphStorm inference pipeline
-(launched by GraphStorm CLI) will process the saved prediction results
-to convert the integer node ids into the raw node ids, which are usually string node ids.**
-Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+(launched by GraphStorm CLIs) will process the saved prediction results
+to convert the integer node IDs into the raw node IDs, which are usually string node IDs. The final output will be in parquet format by default.
+And the prediction files, i.e.,``predict-*.pt`` files, and node ID files,
+i.e.,``predict_nids-*.pt``, ``src_nids-*.pt``, and ``dst_nids-*.pt`` files
+will be removed.** Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`

From 9dbbe128392d744d9b1845ebb746584ce8231f0b Mon Sep 17 00:00:00 2001
From: Xiang Song <xiangsx@amazon.com>
Date: Wed, 14 Aug 2024 17:58:43 -0700
Subject: [PATCH 9/9] Update

---
 docs/source/advanced/multi-task-learning.rst  | 10 +++---
 .../cli/model-training-inference/output.rst   | 31 ++++++++++---------
 2 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/docs/source/advanced/multi-task-learning.rst b/docs/source/advanced/multi-task-learning.rst
index d92fb4f12e..c6d68eb7c8 100644
--- a/docs/source/advanced/multi-task-learning.rst
+++ b/docs/source/advanced/multi-task-learning.rst
@@ -331,7 +331,7 @@ produced by the GNN layer for every node type under the path specified by
 format<gs-out-embs>`. Meanwhile, in multi-task learning, certain tasks might apply
 task specific normalization to node embeddings. For instance, a link prediction
 task might apply l2 normalization on each node embeddings. In certain cases, GraphStorm
-will also save the normalized node embeddings under ``save_embed_path``.
+will also save the normalized node embeddings under the ``save_embed_path``.
 The task specific node embeddings are saved separately under different sub-directories
 named with the corresponding task id. (A task id is formated as ``<task_type>-<ntype/etype(s)>-<label>``.
 For instance, the task id of a node classification task on the node type ``paper`` with the
@@ -342,7 +342,7 @@ and the task id of a edge regression task on the edge type ``(paper, cite, paper
 the label field ``year`` will be ``edge_regression-paper_cite_paper-year``).
 The output format of task specific node embeddings follows
 the :ref:`GraphStorm saved node embeddings format<gs-out-embs>`.
-The contents of  the ``save_embed_path`` in multi-task learning will look like following:
+The contents of the ``save_embed_path`` in multi-task learning will look like following:
 
 .. code-block:: bash
 
@@ -400,8 +400,8 @@ apply task specific normalization on node embeddings.
 
 **Note: The built-in GraphStorm training or inference pipeline
 (launched by GraphStorm CLIs) will process each saved node embeddings
-to convert the integer node IDs into the raw node IDs, which are usually string node IDs..**
-Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+to convert the integer node IDs into the raw node IDs, which are usually string node IDs.**
+Details can be found in :ref:`GraphStorm Output Node ID Remapping<gs-output-remapping>`
 
 Saved Prediction Results
 ~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -443,4 +443,4 @@ The contents of the ``save_prediction_path`` in multi-task learning will look li
 **Note: The built-in GraphStorm inference pipeline
 (launched by GraphStorm CLIs) will process each saved prediction result
 to convert the integer node IDs into the raw node IDs, which are usually string node IDs.**
-Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+Details can be found in :ref:`GraphStorm Output Node ID Remapping<gs-output-remapping>`
diff --git a/docs/source/cli/model-training-inference/output.rst b/docs/source/cli/model-training-inference/output.rst
index 027f5c4c95..0095746fbb 100644
--- a/docs/source/cli/model-training-inference/output.rst
+++ b/docs/source/cli/model-training-inference/output.rst
@@ -18,14 +18,14 @@ The contents of the ``save_model_path`` will look like following:
         ...
 
 When ``save_embed_path`` is provided in the training configuration,
-the node embeddings produced by the bset model checkpoint will be saved
+the node embeddings produced by the best model checkpoint will be saved
 in the corresponding path. When the training task is launched by
-GraphStorm CLIs, a ndoe ID remapping process will be launched
+GraphStorm CLIs, a node ID remapping process will be launched
 automatically, after the training job, to process the saved node embeddings and the corresponding node IDs. The final output of node
-embeddings will be in parquet format by default. Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+embeddings will be in parquet format by default. Details can be found in :ref:`GraphStorm Output Node ID Remapping<gs-output-remapping>`
 
 GraphStorm inference pipeline can save both node embeddings and prediction
-results on disk. When ``save_embed_path`` is provided in the inference configuration,
+results on disk. When ``save_embed_path`` is provided in the inference configurations,
 the node embeddings will be saved in the same way as GraphStorm training pipeline.
 When ``save_prediction_path`` is provided in the inference configurations,
 GraphStorm will save the prediction results in the corresponding path.
@@ -33,14 +33,15 @@ When the inference task is launched by GraphStorm CLIs, a ndoe ID remapping
 process will be launched automatically, after the inference job, to
 process the saved prediction results and the corresponding node IDs.
 The final output of prediction results will be in parquet format by default.
-Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+Details can be found in :ref:`GraphStorm Output Node ID Remapping<gs-output-remapping>`
 
 
 The following sections will introduce how the node embeddings and prediction
 results are saved by the GraphStorm training and inference scripts.
-In most of the end-to-end training and inference cases, the saved
-files, usually in ``.pt`` format, are not consumable by the downstream
-applications. The :ref:`GraphStorm Output Node ID Remapping<output-remapping>` must be invoked to process the output files.
+
+.. note::
+
+    In most of the end-to-end training and inference cases, the saved files, usually in ``.pt`` format, are not consumable by the downstream applications. The :ref:`GraphStorm Output Node ID Remapping<gs-output-remapping>` must be invoked to process the output files.
 
 
 .. _gs-output-embs:
@@ -88,7 +89,7 @@ The contents of ``embed_nids-*`` files and ``embed-*`` files look like:
     23                   |   0.472,0.432,-0.732,...
     ...
 
-The ``emb_info.json`` stores three informations:
+The ``emb_info.json`` stores three types of information:
   * ``format``: The format of the saved embeddings. By default, it is ``pytorch``.
   * ``emb_name``: A list of node types that have node embeddings saved. For example: ["ntype0", "ntype1"]
   * ``world_size``: The number of chunks (files) into which the node embeddings of a particular node type are divided. For instance, if world_size is set to 8, there will be 8 files for each node type's node embeddings."
@@ -99,7 +100,7 @@ to convert the integer node IDs into the raw node IDs, which are usually
 string node IDs. The final output will be in parquet format by default.
 And the node embedding files, i.e.,``embed-*.pt`` files, and node ID
 files, i.e.,``embed_nids-*.pt`` files, will be removed.** Details can be
-found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+found in :ref:`GraphStorm Output Node ID Remapping<gs-output-remapping>`
 
 .. _gs-out-predictions:
 
@@ -138,7 +139,7 @@ The content of ``predict_nids-*`` files and ``predict-*`` files looks like:
 
 .. code-block::
 
-    predict_nids-00000.pt  |   predict.pt
+    predict_nids-00000.pt  |   predict-00000.pt
                            |
     Graph Node ID          |   Prediction results
     10                     |   0.112
@@ -146,7 +147,7 @@ The content of ``predict_nids-*`` files and ``predict-*`` files looks like:
     23                     |   0.472
     ...
 
-The ``result_info.json`` stores three informations:
+The ``result_info.json`` stores three types of information:
   * ``format``: The format of the saved prediction results. By default, it is ``pytorch``.
   * ``emb_name``: A list of node types that have node prediction results saved. For example: ["ntype0", "ntype1"]
   * ``world_size``: The number of chunks (files) into which the prediction results of a particular node type are divided. For instance, if world_size is set to 8, there will be 8 files for each node type's prediction results."
@@ -193,7 +194,7 @@ The content of ``src_nids-*``, ``dst_nids-*`` and ``predict-*`` files looks like
 
 .. code-block::
 
-    src_nids-00000.pt   |   dst_nids-00000.pt   |   predict.pt
+    src_nids-00000.pt   |   dst_nids-00000.pt   |   predict-00000.pt
                         |
     Source Node ID      |   Destination Node ID |   Prediction results
     10                  |   12                  |   0.112
@@ -201,7 +202,7 @@ The content of ``src_nids-*``, ``dst_nids-*`` and ``predict-*`` files looks like
     23                  |   3                   |   0.472
     ...
 
-The ``result_info.json`` stores three informations:
+The ``result_info.json`` stores three types of informations:
   * ``format``: The format of the saved prediction results. By default, it is ``pytorch``.
   * ``etypes``: A list of edge types that have edge prediction results saved. For example: [("movie","rated-by","user"), ("user","watched","movie")]
   * ``world_size``: The number of chunks (files) into which the prediction results of a particular edge type are divided. For instance, if world_size is set to 8, there will be 8 files for each edge type's prediction results."
@@ -211,4 +212,4 @@ The ``result_info.json`` stores three informations:
 to convert the integer node IDs into the raw node IDs, which are usually string node IDs. The final output will be in parquet format by default.
 And the prediction files, i.e.,``predict-*.pt`` files, and node ID files,
 i.e.,``predict_nids-*.pt``, ``src_nids-*.pt``, and ``dst_nids-*.pt`` files
-will be removed.** Details can be found in :ref:`GraphStorm Output Node ID Remapping<output-remapping>`
+will be removed.** Details can be found in :ref:`GraphStorm Output Node ID Remapping<gs-output-remapping>`