-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug fix] Fix the read/delete contention bug when running distributed remaping result/emb #672
Conversation
error_and_exit $? | ||
|
||
python3 $GS_HOME/tests/end2end-tests/data_process/check_edge_predict_remap.py --remap-output /tmp/ep_remap/pred/ | ||
|
||
error_and_exit $? | ||
|
||
cnt=$(ls /tmp/ep_remap/pred/src_nids-*.pt | wc -l) | ||
if test $cnt == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should here be $cnt != 2
or $cnt == 0
to match the then
output?
fi | ||
|
||
cnt=$(ls /tmp/ep_remap/pred/dst_nids-*.pt | wc -l) | ||
if test $cnt == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
fi | ||
|
||
cnt=$(ls /tmp/ep_remap/pred/predict-*.pt | wc -l) | ||
if test $cnt == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
error_and_exit $? | ||
|
||
python3 $GS_HOME/tests/end2end-tests/data_process/check_edge_predict_remap.py --remap-output /tmp/ep_remap/rename-pred/ --column-names "src_nid,~from:STRING" "dst_nid,~to:STRING" "pred,pred:FLOAT" | ||
|
||
cnt=$(ls /tmp/ep_remap/rename-pred/src_nids-*.pt | wc -l) | ||
if test $cnt == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be $cnt != 0
to match the then
output?
fi | ||
|
||
cnt=$(ls /tmp/ep_remap/rename-pred/dst_nids-*.pt | wc -l) | ||
if test $cnt == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
fi | ||
|
||
cnt=$(ls /tmp/ep_remap/rename-pred/predict-*.pt | wc -l) | ||
if test $cnt == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except for the result condition, others LGTM.
error_and_exit $? | ||
|
||
python3 $GS_HOME/tests/end2end-tests/data_process/check_node_predict_remap.py --remap-output /tmp/np_remap/pred/ | ||
|
||
error_and_exit $? | ||
|
||
cnt=$(ls /tmp/np_remap/pred/predict_nids-*.pt | wc -l) | ||
if test $cnt == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be $cnt != 0
to match the echo
contents?
exit -1 | ||
fi | ||
|
||
cnt=$(ls /tmp/np_remap/pred/predict-*.pt | wc -l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
error_and_exit $? | ||
|
||
python3 $GS_HOME/tests/end2end-tests/data_process/check_emb_remap.py --remap-output /tmp/em_remap/partial-emb/ | ||
|
||
error_and_exit $? | ||
|
||
cnt=$(ls /tmp/em_remap/partial-emb/embed_nids-*.pt | wc -l) | ||
if test $cnt == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
fi | ||
|
||
cnt=$(ls /tmp/em_remap/partial-emb/embed-*.pt | wc -l) | ||
if test $cnt == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
error_and_exit $? | ||
|
||
python3 $GS_HOME/tests/end2end-tests/data_process/check_emb_remap.py --remap-output /tmp/em_remap/partial-rename-emb/ --column-names "nid,~id:STRING" "emb,emb:FLOAT" | ||
|
||
error_and_exit $? | ||
|
||
cnt=$(ls /tmp/em_remap/partial-rename-emb/embed_nids-*.pt | wc -l) | ||
if test $cnt == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
fi | ||
|
||
cnt=$(ls /tmp/em_remap/partial-rename-emb/embed-*.pt | wc -l) | ||
if test $cnt == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Issue #, if available:
When doing remap_result in a distributed way. It is possible that some processes are still collecting remap tasks (scanning the embedding files and prediction files) while others have finished the tasks and start removing processed files. This will cause an read/delete contention.
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.