Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes:
This PR reorganizes the code of performing inference of GNN and LM models.
Specifically, we split the nodes for inference based on locality. In this case, the embeddings are saved to local partitions via shared memory and we only need to run barrier before returning fron the inference function.
However, when computing LM embeddings, we split the nodes evenly to ensure all processes take roughly the same amount of time to compute LM embeddings. Otherwise, we will see a timeout in barrier in some processes. Because now the nodes are split evenly, we need to write data to remote memory. Before returning from the inference function, we need to call flush_data to ensure all data written to distributed memory can be read correctly.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.