You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is your environment(Kubernetes version, Fluid version, etc.)
kubernetes version: v1.28
Fluid version: 1.0.1-14eda3b
kubernetes cluster: 1 control plane + 1 node +1 remote storage server with a s3 bucket
cache runtime: alluxio, 20G MEM, replica 1
training job: pretrained resnet50 model with10G raw data of small files(.jpg).
Describe the bug
test scenarios:
10G raw data is stored in the s3 buckte of the remote storage server, and use Fluid to abstract this s3 bucket into pv in k8s cluster. this pv is mounted in the training job pod as training raw data directory. before training started, this dataset has been preloaded into fluid cache system.
without using fluid, 10G raw data is stored in the k8s node's local directory, and use hostpath to mount this directory into the training job pod as training raw data.
Results of the above 2 scenarios show:
training time of scenario 1: 155.51s (training time of no-preloaded dataset is 950.9s )
training time of scenario 2: 141.7s
What you expect to happen:
As far as I understand, in scenario 2, raw data is stored in a local directory but not in memory; in scenario 1, raw data has already cached in Fluid cache system(also in the k8s node's memory). So, training time of scenario 2 should be smaller than that of scenario 1 due to data access speed of memory is much higher than tant of loacl directory.
But the training time of scenario 1 is always about 10s longer than scenario 2.
How to reproduce it
Additional Information
The text was updated successfully, but these errors were encountered:
@liumiaomiaoIntel Thank you for opening the issue. I think we need more details. For example, if you use remote cache through network but with local path even in page cache. The test result may be different. Please reach me by my email: [email protected].
What is your environment(Kubernetes version, Fluid version, etc.)
kubernetes version: v1.28
Fluid version: 1.0.1-14eda3b
kubernetes cluster: 1 control plane + 1 node +1 remote storage server with a s3 bucket
cache runtime: alluxio, 20G MEM, replica 1
training job: pretrained resnet50 model with10G raw data of small files(.jpg).
Describe the bug
test scenarios:
Results of the above 2 scenarios show:
training time of scenario 1: 155.51s (training time of no-preloaded dataset is 950.9s )
training time of scenario 2: 141.7s
What you expect to happen:
As far as I understand, in scenario 2, raw data is stored in a local directory but not in memory; in scenario 1, raw data has already cached in Fluid cache system(also in the k8s node's memory). So, training time of scenario 2 should be smaller than that of scenario 1 due to data access speed of memory is much higher than tant of loacl directory.
But the training time of scenario 1 is always about 10s longer than scenario 2.
How to reproduce it
Additional Information
The text was updated successfully, but these errors were encountered: