Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug fix] Fix the read/delete contention bug when running distributed remaping result/emb #672

Merged
merged 9 commits into from
Dec 1, 2023

Conversation

classicsong
Copy link
Contributor

Issue #, if available:
When doing remap_result in a distributed way. It is possible that some processes are still collecting remap tasks (scanning the embedding files and prediction files) while others have finished the tasks and start removing processed files. This will cause an read/delete contention.

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@classicsong classicsong requested a review from zheng-da November 30, 2023 09:32
@classicsong classicsong added the ready able to trigger the CI label Nov 30, 2023
@classicsong classicsong merged commit 20394ee into awslabs:main Dec 1, 2023
6 checks passed
@classicsong classicsong deleted the fix-remap-bug branch December 1, 2023 21:21
error_and_exit $?

python3 $GS_HOME/tests/end2end-tests/data_process/check_edge_predict_remap.py --remap-output /tmp/ep_remap/pred/

error_and_exit $?

cnt=$(ls /tmp/ep_remap/pred/src_nids-*.pt | wc -l)
if test $cnt == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should here be $cnt != 2 or $cnt == 0 to match the then output?

fi

cnt=$(ls /tmp/ep_remap/pred/dst_nids-*.pt | wc -l)
if test $cnt == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

fi

cnt=$(ls /tmp/ep_remap/pred/predict-*.pt | wc -l)
if test $cnt == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

error_and_exit $?

python3 $GS_HOME/tests/end2end-tests/data_process/check_edge_predict_remap.py --remap-output /tmp/ep_remap/rename-pred/ --column-names "src_nid,~from:STRING" "dst_nid,~to:STRING" "pred,pred:FLOAT"

cnt=$(ls /tmp/ep_remap/rename-pred/src_nids-*.pt | wc -l)
if test $cnt == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be $cnt != 0 to match the then output?

fi

cnt=$(ls /tmp/ep_remap/rename-pred/dst_nids-*.pt | wc -l)
if test $cnt == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

fi

cnt=$(ls /tmp/ep_remap/rename-pred/predict-*.pt | wc -l)
if test $cnt == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@zhjwy9343 zhjwy9343 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for the result condition, others LGTM.

error_and_exit $?

python3 $GS_HOME/tests/end2end-tests/data_process/check_node_predict_remap.py --remap-output /tmp/np_remap/pred/

error_and_exit $?

cnt=$(ls /tmp/np_remap/pred/predict_nids-*.pt | wc -l)
if test $cnt == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be $cnt != 0 to match the echo contents?

exit -1
fi

cnt=$(ls /tmp/np_remap/pred/predict-*.pt | wc -l)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

error_and_exit $?

python3 $GS_HOME/tests/end2end-tests/data_process/check_emb_remap.py --remap-output /tmp/em_remap/partial-emb/

error_and_exit $?

cnt=$(ls /tmp/em_remap/partial-emb/embed_nids-*.pt | wc -l)
if test $cnt == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

fi

cnt=$(ls /tmp/em_remap/partial-emb/embed-*.pt | wc -l)
if test $cnt == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

error_and_exit $?

python3 $GS_HOME/tests/end2end-tests/data_process/check_emb_remap.py --remap-output /tmp/em_remap/partial-rename-emb/ --column-names "nid,~id:STRING" "emb,emb:FLOAT"

error_and_exit $?

cnt=$(ls /tmp/em_remap/partial-rename-emb/embed_nids-*.pt | wc -l)
if test $cnt == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

fi

cnt=$(ls /tmp/em_remap/partial-rename-emb/embed-*.pt | wc -l)
if test $cnt == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready able to trigger the CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants