Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the feature format for generated samples of ENZYMES dataset #11

Open
lizaitang opened this issue Jun 2, 2023 · 9 comments

Comments

@lizaitang
Copy link

Dear Author, you paper really helps a lot, but I have a question that I want to pass the generated graph to some classifiers, but seems the node features generated by the GDSS is different from the original dataset in data scale. How can I solve it? Thanks

@harryjo97
Copy link
Owner

harryjo97 commented Jun 2, 2023

Hi lizaitang,

In our work, we used the degree of each node as the node features instead of the given node features of the original dataset. In order to use the node features of the original dataset, you can modify the code in

def graphs_to_dataloader(config, graph_list):
and
def init_features(init, adjs=None, nfeat=10):

for loading the node features of the dataset.
After changing these, you could newly train the score models to generate both node features and the adjacency matrices.

@lizaitang
Copy link
Author

Dear Author, thank you very much for your quick reply! How can we modify the code to load node feature of the dataset? Change the init to zeros or ones? Or we dirrectly change x_tensor = init_features(config.data.init, adjs_tensor, config.data.max_feat_num) to the feature of the dataset?
def init_features(init, adjs=None, nfeat=10):

if init=='zeros':
    feature = torch.zeros((adjs.size(0), adjs.size(1), nfeat), dtype=torch.float32, device=adjs.device)
elif init=='ones':
    feature = torch.ones((adjs.size(0), adjs.size(1), nfeat), dtype=torch.float32, device=adjs.device)
elif init=='deg':
    feature = adjs.sum(dim=-1).to(torch.long)
    num_classes = nfeat
    try:
        feature = F.one_hot(feature, num_classes=num_classes).to(torch.float32)
    except:
        print(feature.max().item())
        raise NotImplementedError(f'max_feat_num mismatch')
else:
    raise NotImplementedError(f'{init} not implemented')

flags = node_flags(adjs)

return mask_x(feature, flags)

@harryjo97
Copy link
Owner

You can change init_features in

def init_features(init, adjs=None, nfeat=10):

to take in graph_list as input and return the node features.
To be specific, each graph in the graph_list is a networkx Graph with node features.

Or you could directly modify x_tensor = init_features(config.data.init, adjs_tensor, config.data.max_feat_num) in

def graphs_to_dataloader(config, graph_list):

to obtain the original node features from the networkx Graph.

Please refer to the networkx documentation for more details.

FYI, the attributed graphs of the ENZYMES dataset are loaded by this function:
https://github.com/harryjo97/GDSS/blob/4d96334fd0d07577f9891e9d5e81dae4d64a92fd/data/data_generators.py#LL131C13-L131C13

@lizaitang
Copy link
Author

Dear Author,
Thank you so much for your quick reply, I have a minor question that I follow the format of graph_to_tensor to load the original node features, but for v, feature in g.nodes.data('feature') gives feature as none, could you please help to fix on it?

def feat_to_tensor(graph_list, max_node_num,max_feat_num):
    feat_list = []
    max_node_num = max_node_num

    for g in graph_list:
        assert isinstance(g, nx.Graph)

        node_feat_list = np.zeros([max_node_num,max_feat_num], dtype = float)
        i=0 
        for v, feature in g.nodes.data('feature'):
            
            node_feat_list[i]=feature
            
            i=i+1
        #print(node_feat_list)
       
        feat_list.append(node_feat_list)

    del graph_list

    feat_np = np.asarray(feat_list)
    del feat_list

    adjs_tensor = torch.tensor(feat_np, dtype=torch.float32)
    del feat_np

    return adjs_tensor 

@harryjo97
Copy link
Owner

In the graph loader code:

def graph_load_batch(min_num_nodes=20, max_num_nodes=1000, name='ENZYMES', node_attributes=True, graph_labels=True):

The node labels you are looking for are saved in g.nodes.data('label') (saved by Line 158 G.add_node(i + 1, label=data_node_label[i]))

You may want to try g.nodes.data('label') instead of g.nodes.data('feature').

@lizaitang
Copy link
Author

Thanks for your reply, but if I want to generate graph with same format node features, shouldn't we use the feature instead of node labels?

@harryjo97
Copy link
Owner

I think the node features you want to use for the classifier are contained in the label.

@lizaitang
Copy link
Author

Sorry to bother, but I try label, ```
[[2. 2. 2. ... 2. 2. 2.]
[2. 2. 2. ... 2. 2. 2.]
[2. 2. 2. ... 2. 2. 2.]

@harryjo97
Copy link
Owner

First of all, the label contains other values other than 2 (please see https://github.com/harryjo97/GDSS/blob/master/dataset/ENZYMES/ENZYMES_node_labels.txt)

Furthermore, if you want to use the node attributes in https://github.com/harryjo97/GDSS/blob/master/dataset/ENZYMES/ENZYMES_node_attributes.txt,
you may change the code in:

graphs = graph_load_batch(min_num_nodes=10, max_num_nodes=1000, name=dataset,

by setting the node_attributes=True which will load the node attributes file by
data_node_att = np.loadtxt(path + name + '_node_attributes.txt', delimiter=',')
in
def graph_load_batch(min_num_nodes=20, max_num_nodes=1000, name='ENZYMES', node_attributes=True, graph_labels=True):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants