Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add Doc String for graphstorm.dataloading.dataloading #471

Closed
wants to merge 26 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
41b36cf
add doc string for dataloaders
GentleZhu Sep 24, 2023
44aced7
Fix the logging (#472)
zheng-da Sep 24, 2023
4405ec4
Enable hdf5 embed writing (#463)
HouyuZhang1007 Sep 25, 2023
732194d
precision_recall need set return_proba=True (#451)
CongWeilin Sep 25, 2023
3f7b81a
Support load conf for min/max transformation. (#473)
classicsong Sep 25, 2023
72bbe9a
[Bug Fix] Disable storing embeddings when inferring on ec/er task (#477)
jalencato Sep 25, 2023
603d481
[GSProcessing] Improve handling of configs with unknown version. (#480)
thvasilo Sep 25, 2023
776ad95
Updated MAG example with GLEM instructions (#474)
wangz10 Sep 26, 2023
c7606a6
Fix a bug in restoring models in the inference script (#482)
zheng-da Sep 26, 2023
a0d7e8b
[Doc] Reorg API Documentation part-1 (#460)
zhjwy9343 Sep 26, 2023
1905464
Enable nccl for embedding save (#484)
classicsong Sep 27, 2023
3228b97
Use zfill to make sure the saved embeddings or prediction results are…
classicsong Sep 27, 2023
1a26d12
[Bug Fix] Change on EC test (#491)
jalencato Sep 27, 2023
dbf92e7
fix comments
GentleZhu Sep 27, 2023
7bce306
[BugFix] Handle the case when user provides test_mask but do not want…
classicsong Sep 27, 2023
5a4cb98
Fix the problem of caching twice (#493)
zheng-da Sep 27, 2023
b19bdcf
[GSProcessing] Name reverse edges as `dst:relation-rev:src` (#490)
thvasilo Sep 27, 2023
58262c6
Fix lint does not work with astroid==3.0 error. (#496)
classicsong Sep 27, 2023
74c99ec
[Wholegraph] Build docker image for WholeGraph-GraphStorm (#485)
isratnisa Sep 27, 2023
4c825d0
[Bug Fix] Add test to EC (#497)
jalencato Sep 27, 2023
87aa152
[GSProcessing] Add GSProcessing documentation (#467)
thvasilo Sep 27, 2023
06f5c96
[Doc] Add Doc String for graphstorm.dataloading.dataset (#470)
GentleZhu Sep 27, 2023
e66cfcd
add doc string for dataloaders
GentleZhu Sep 24, 2023
09b7b97
fix comments
GentleZhu Sep 27, 2023
0e95964
Merge branch 'dataloader-doc' of https://github.com/awslabs/graphstor…
GentleZhu Sep 28, 2023
a31f359
add line before and after code::
GentleZhu Sep 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 51 additions & 1 deletion python/graphstorm/dataloading/dataloading.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ def fanout(self):
class GSgnnEdgeDataLoader(GSgnnEdgeDataLoaderBase):
""" The minibatch dataloader for edge prediction

GSgnnEdgeDataLoader samples GraphStorm edge dataset into an iterable over mini-batches
of samples, both source and destination nodes are included in the batch_graph.
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved

Parameters
------------
dataset: GSgnnEdgeData
Expand All @@ -182,6 +185,20 @@ class GSgnnEdgeDataLoader(GSgnnEdgeDataLoaderBase):
The node types that requires to construct node features.
construct_feat_fanout : int
The fanout required to construct node features.

Examples
------------
To train a 2-layer GNN for edge prediction on a set of edges ``target_idx`` on
a homogenous graph where each nodes takes messages from 15 neighbors on the first layer
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved
and 10 neighbors on the second.
.. code:: python
from graphstorm.dataloading import GSgnnEdgeTrainData
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved
from graphstorm.dataloading import GSgnnEdgeDataLoader

ep_data = GSgnnEdgeTrainData(...)
ep_dataloader = GSgnnEdgeDataLoader(ep_data, target_idx, fanout=[15, 10], batch_size=128)
for input_nodes, batch_graph, blocks in ep_dataloader:
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved
train_on(input_nodes, batch_graph, blocks)
"""
def __init__(self, dataset, target_idx, fanout, batch_size, device='cpu',
train_task=True, reverse_edge_types_map=None,
Expand Down Expand Up @@ -357,7 +374,9 @@ def target_eidx(self):
class GSgnnLinkPredictionDataLoader(GSgnnLinkPredictionDataLoaderBase):
""" Link prediction minibatch dataloader

The negative edges are sampled uniformly.
GSgnnLinkPredictionDataLoader samples GraphStorm edge dataset into an iterable over mini-batches
of samples. In each batch, pos_graph and neg_graph are sampled subgraph for positive and
negative edges, respectively. Given positive edges, the negative edges are sampled uniformly.
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved

Argument
--------
Expand Down Expand Up @@ -387,6 +406,21 @@ class GSgnnLinkPredictionDataLoader(GSgnnLinkPredictionDataLoaderBase):
The node types that requires to construct node features.
construct_feat_fanout : int
The fanout required to construct node features.

Examples
------------
To train a 2-layer GNN for link prediction on a set of positive edges ``target_idx`` on
a homogenous graph where each nodes takes messages from 15 neighbors on the first layer
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved
and 10 neighbors on the second. We use 10 negative edges per positive in this example.
.. code:: python
from graphstorm.dataloading import GSgnnEdgeTrainData
from graphstorm.dataloading import GSgnnLinkPredictionDataLoader

ep_data = GSgnnEdgeTrainData(...)
ep_dataloader = GSgnnLinkPredictionDataLoader(ep_data, target_idx, fanout=[15, 10],
num_negative_edges=10, batch_size=128)
for input_nodes, pos_graph, neg_graph, blocks in ep_dataloader:
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved
train_on(input_nodes, pos_graph, neg_graph, blocks)
"""
def __init__(self, dataset, target_idx, fanout, batch_size, num_negative_edges, device='cpu',
train_task=True, reverse_edge_types_map=None, exclude_training_targets=False,
Expand Down Expand Up @@ -961,6 +995,8 @@ def fanout(self):

class GSgnnNodeDataLoader(GSgnnNodeDataLoaderBase):
""" Minibatch dataloader for node tasks

GSgnnNodeDataLoader samples GraphStorm node dataset into an iterable over mini-batches of samples.
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
Expand All @@ -980,6 +1016,20 @@ class GSgnnNodeDataLoader(GSgnnNodeDataLoaderBase):
The node types that requires to construct node features.
construct_feat_fanout : int
The fanout required to construct node features.

Examples
----------
To train a 2-layer GNN for node classification on a set of nodes ``target_idx`` on
a homogenous graph where each nodes takes messages from 15 neighbors on the first layer
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved
and 10 neighbors on the second.
.. code:: python
from graphstorm.dataloading import GSgnnNodeTrainData
from graphstorm.dataloading import GSgnnNodeDataLoader

np_data = GSgnnNodeTrainData(...)
np_dataloader = GSgnnNodeDataLoader(np_data, target_idx, fanout=[15, 10], batch_size=128)
for input_nodes, output_nodes, blocks in np_dataloader:
GentleZhu marked this conversation as resolved.
Show resolved Hide resolved
train_on(input_nodes, output_nodes, blocks)
"""
def __init__(self, dataset, target_idx, fanout, batch_size, device, train_task=True,
construct_feat_ntype=None, construct_feat_fanout=5):
Expand Down