You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we are currently developing a R-GCN based GNN for heterogeneous graphs with roughly 8 different node types. The GNN shall detect fraudulent behaviour by analysing the relations between different customers, their payment and authentification methods and their transactions. We have a large, labeled dataset and want to predict, which transaction is fraudulent and which is not. We use AWS Neptune as our Graph-Database and we have already implemented a NeptuneML Model, that works relatively well for our usecase.
As we want to do live predictions, our model must be inductive. Additionally, we have 6 node types, that do not have any features, as we want to focus on the relations between the nodes and these nodes do not have features, that naturally arise.
Our Setup
In order to transition to GraphStorm, we used the command line training process for SageMaker with a slightly adapted model_train.yaml file and our own graph, that was prepared with graphstorm.gconstruct.construct_graph.
For the usage in a SageMaker endpoint however, we had to adapt the GraphStrom model drastically, as it had no possibility, to return predictions to python and would always output embeddings and preditions to a file. To load the model, we used GraphStorms create_builtin_node_gnn_model function, that takes a DGLGraph and a GSConfig object as input. As the function needs a DGLGraph to setup the model, we decided to use a minimal graph, that has one node for each node type and one edge for each edge type, so that all required layers were build correctly. This worked and we were able to load our trained model and do inference with it.
Our Challenge
However, we realized, that the create_builtin_node_gnn_model function also adds a GSNodeEncoderInputLayer that utilizes a Node Embedding when a node type has no features. This does not fit our use-case, as we want to perform inductive inference which is impossible, if node embeddings are required. Is there a way to solve this issue by, for example, using node type embeddings. I looked for it in the documentation and inside the code, but I only found a way to do feature construction, which we cannot use.
An alternative we thought about was to use maximal orthogonal vectors for each node type as node features, so that each node type has some sort of initial, static node type embedding. We used an optimization to increase the orthogonality between each of the vectors to maximize the usefulness as embedding. Do you think this approach could work as a workaround?
I can also provide you with more information if needed, but tried to keep it as short and simple as possible.
The text was updated successfully, but these errors were encountered:
Hi GraphStorm Team,
we are currently developing a R-GCN based GNN for heterogeneous graphs with roughly 8 different node types. The GNN shall detect fraudulent behaviour by analysing the relations between different customers, their payment and authentification methods and their transactions. We have a large, labeled dataset and want to predict, which transaction is fraudulent and which is not. We use AWS Neptune as our Graph-Database and we have already implemented a NeptuneML Model, that works relatively well for our usecase.
As we want to do live predictions, our model must be inductive. Additionally, we have 6 node types, that do not have any features, as we want to focus on the relations between the nodes and these nodes do not have features, that naturally arise.
Our Setup
In order to transition to GraphStorm, we used the command line training process for SageMaker with a slightly adapted model_train.yaml file and our own graph, that was prepared with graphstorm.gconstruct.construct_graph.
For the usage in a SageMaker endpoint however, we had to adapt the GraphStrom model drastically, as it had no possibility, to return predictions to python and would always output embeddings and preditions to a file. To load the model, we used GraphStorms create_builtin_node_gnn_model function, that takes a DGLGraph and a GSConfig object as input. As the function needs a DGLGraph to setup the model, we decided to use a minimal graph, that has one node for each node type and one edge for each edge type, so that all required layers were build correctly. This worked and we were able to load our trained model and do inference with it.
Our Challenge
However, we realized, that the create_builtin_node_gnn_model function also adds a GSNodeEncoderInputLayer that utilizes a Node Embedding when a node type has no features. This does not fit our use-case, as we want to perform inductive inference which is impossible, if node embeddings are required. Is there a way to solve this issue by, for example, using node type embeddings. I looked for it in the documentation and inside the code, but I only found a way to do feature construction, which we cannot use.
An alternative we thought about was to use maximal orthogonal vectors for each node type as node features, so that each node type has some sort of initial, static node type embedding. We used an optimization to increase the orthogonality between each of the vectors to maximize the usefulness as embedding. Do you think this approach could work as a workaround?
I can also provide you with more information if needed, but tried to keep it as short and simple as possible.
The text was updated successfully, but these errors were encountered: