Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in inference code #696

Merged
merged 6 commits into from
Dec 30, 2023
Merged

Fix a bug in inference code #696

merged 6 commits into from
Dec 30, 2023

Conversation

zheng-da
Copy link
Contributor

@zheng-da zheng-da commented Dec 27, 2023

Description of changes:
This PR reorganizes the code of performing inference of GNN and LM models.
Specifically, we split the nodes for inference based on locality. In this case, the embeddings are saved to local partitions via shared memory and we only need to run barrier before returning fron the inference function.
However, when computing LM embeddings, we split the nodes evenly to ensure all processes take roughly the same amount of time to compute LM embeddings. Otherwise, we will see a timeout in barrier in some processes. Because now the nodes are split evenly, we need to write data to remote memory. Before returning from the inference function, we need to call flush_data to ensure all data written to distributed memory can be read correctly.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Ubuntu and others added 2 commits October 2, 2023 15:49
@zheng-da zheng-da requested a review from classicsong December 27, 2023 05:18
@zheng-da zheng-da added the ready able to trigger the CI label Dec 27, 2023
@zheng-da zheng-da merged commit e82cb8e into awslabs:main Dec 30, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready able to trigger the CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants