-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDMA TLB 在不同特征维度下的测试 #19
Labels
Comments
今天重新分析了下,TLB的命中缺失带来的开销和特征Dim有关系的一个核心原因很可能是PTW带来的时间开销相对于特征读取开销的占比。特征比较大的时候,PTW的开销相对而言没有那么明显,而特征比较小的时候,PTW的开销相对占比就会比较高了。 所以这件事儿其实是两个维度:
|
IB ParamsPOST_LIST_SIZE = 128
CQ_MOD = 1
QP_NUM = 8
TX_DEPTH = 2048 W/O TLB2机2卡
2机4卡
2机6卡
W/ TLB2机2卡
2机4卡
2机6卡
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
RDMA TLB Results
call for help: @Aiemu
https://github.com/quiver-team/quiver-feature/blob/main/tests/python/test_MultiMachineDistTensorClientServer.py
IB Params:
FeatureDim = 128, Tensor Size: 228.8818359375 GB, Sample Size = 250000
W/O TLB
2机2卡: 8488.63404334975 MB/s
2机4卡:
2机6卡:
W/ TLB
2机2卡:
2机4卡:
2机6卡
The text was updated successfully, but these errors were encountered: