Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Document] Update doc about saved prediction results and embeddings #665

Merged
merged 1 commit into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 6 additions & 12 deletions docs/source/tutorials/quick-start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ The inference command is:
--save-prediction-path /tmp/ogbn-arxiv-nc/predictions/ \
--restore-model-path /tmp/ogbn-arxiv-nc/models/epoch-7/

This inference command predicts the classes of nodes in the testing set and saves the results, a Pytorch tensor file named "**predict-00000.pt**", into the ``/tmp/ogbn-arxiv-nc/predictions/`` folder.
This inference command predicts the classes of nodes in the testing set and saves the results, a list of parquet files named **predict-00000_00000.parquet**, **predict-00001_00000.parquet**, ..., into the ``/tmp/ogbn-arxiv-nc/predictions/node/`` folder. Each parquet file has two columns, `nid` column for storing node IDs and `pred` column for storing prediction results.

Inference on link prediction is similar as shown in the command below.

Expand All @@ -159,7 +159,7 @@ Inference on link prediction is similar as shown in the command below.
--save-embed-path /tmp/ogbn-arxiv-lp/predictions/ \
--restore-model-path /tmp/ogbn-arxiv-lp/models/epoch-2/

The inference outputs include a **"emb_info.json"** metadata file and the prediction result file, **"embed-00000.pt"** in the ``/tmp/ogbn-arxiv-lp/predictions/`` folder.
The inference outputs the saved embeddings, a list of parquet files named **embed-00000_00000.parquet**, **embed-00001_00000.parquet**, ..., in the ``/tmp/ogbn-arxiv-lp/predictions/node/`` folder. Each parquet file has two columns, `nid` column for storing node IDs and `emb` column for storing embeddings.

Generating Embedding
--------------------
Expand Down Expand Up @@ -201,18 +201,12 @@ The saved result will be like:
/tmp/saved_embed
emb_info.json
node_type1/
embed_nids-00000.pt
embed_nids-00001.pt
...
embed-00000.pt
embed-00001.pt
embed-00000_00000.parquet
embed-00000_00001.parquet
...
node_type2/
embed_nids-00000.pt
embed_nids-00001.pt
...
embed-00000.pt
embed-00001.pt
embed-00000_00000.parquet
embed-00000_00001.parquet
...

**That is it!** You have learnt how to use GraphStorm in three steps.
Expand Down
10 changes: 5 additions & 5 deletions python/graphstorm/gconstruct/remap_result.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,12 +210,12 @@ def remap_node_emb(emb_ntypes, node_emb_dir,
--------
# embedddings:
# ntype0:
# emb_part00000_00000.parquet
# emb_part00000_00001.parquet
# embed-00000_00000.parquet
# embed-00000_00001.parquet
# ...
# ntype1:
# emb_part00000_00000.parquet
# emb_part00000_00001.parquet
# embed-00000_00000.parquet
# embed-00000_00001.parquet
# ...

Parameters
Expand Down Expand Up @@ -400,7 +400,7 @@ def remap_edge_pred(pred_etypes, pred_dir,
# dst_nids-00001.pt
# ...

The output emb files will be
The output prediction files will be
# predict-00000_00000.parquet
# predict-00000_00001.parquet
# ...
Expand Down