Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs Translation: Storage Issue #3818

Closed
SteveSamJacob19 opened this issue Jul 25, 2024 · 4 comments
Closed

Docs Translation: Storage Issue #3818

SteveSamJacob19 opened this issue Jul 25, 2024 · 4 comments
Assignees
Labels
infrastructure Work that enables/improves hosting the site and its content

Comments

@SteveSamJacob19
Copy link
Contributor

SteveSamJacob19 commented Jul 25, 2024

Whats the Status of Docs Translation?

  • The code for the docs translation is done according to natalie and it can be built locally (although will take a long time).

  • The issue is in trying to run the pipeline after the code is pushed into the demo2 branch as it shows an error saying “No space left on device”.

  • The docs contain around 2000+ files and hence having translated docs for ja and zha gives an additional 4000+ files increasing size massively.

  • Natalie has moved to a different project and hence the work has been taken over by Steve

Whats the Error?

  1. Information Received from Natalie:

    • The main issue that we are running into is during a build with the translation changes. The build fails when trying to execute step 46/47 in dockerfile.draft file

    • The machine that is used to build the website runs out of memory and hence shows error : Copy file range failed: No space left on device .

    • We use the “IBM Managed Workers” which are shared worker machines.

Screenshot 2024-07-25 at 11 34 32 AM
  1. Fixes that Natalie tried:

    • Changing line 260 in the ci-pipeline.yml file to increase memory argument to 4g to try to match or estimate the storage that we have available on the machine that builds the website. docker build --tag "$IMAGE_URL:latest" --memory 4g --file $PATH_TO_DOCKERFILE/$DOCKERFILE $PATH_TO_CONTEXT

    • Tried modifying the storage in the shared workers using t shirt sizing( Managed worker virtual machine sizing) which basically adds a label to our pipeline telling the IBM Managed Worker pool to assign a VM with more memory.

    • Tried removing any additional or unnecessary files from the code base to reduce space.

  2. Inferences and Doubts made after Steve took over:

    • Natalie mentioned that the VM does not have any storage but only memory, however it seems that the host machine should have storage(disk usage) and the issue is with the storage space and not the memory(RAM).

    • This could be the reason why the changes to memory argument and t shirt sizing did not work.

    • Need to confirm if the shared worker machine is spun up new every time a pipeline runs or does it contain past images?

    • If the former, then it is ephemeral storage and we shouldnt have much trouble in increasing it since it will get cleared as soon as the pod finishes its task

    • But this is just the first part, even if we are able to increase the storage size of VM to build the image, the ICR may not have enough storage to store the image.

    • We added docker system df and docker images before the docker build step in the tekton file ‘ci-pipeline.yml’ inside the ci-pipeline-draft folder. This would help us see all the images present in the pod and also the space occupied and available in the pod. On running it, we found that it showed no images which should confirm that the host machine is spinning up new pods everytime. But interestingly, the docker system df command also showed 0 as the values for space used and space available in the container even without any images.

    Screenshot 2024-07-25 at 11 31 41 AM
    • We also ran a df -h / to get the size of the pod and it showed around 30 gb with only 1% of it used.
Screenshot 2024-07-25 at 1 42 20 PM

What can be done?

  • If we confirm that the host machine does use disk storage, then we can try increasing the ephemeral storage. However we found multiple values of ephemeral storage in and hence is confused as to which one actually corresponds to the pod disk storage. We found that the ephemeral storage is set to 0.4G in the ce-openliberty.io-draft-pipeline ci-pipeline.yml file, but is set to 0.5G in the ci pipeline runs in ibm cloud and when df -h / and docker system df are run, it shows 30gb and 0 respectively

  • Increase the size of the pod from 30Gb to a higher value.

  • Use a separate virtual disk that can be volume mounted so the images will be stored on the disk and referenced. Will make the builds slower tho.

@SteveSamJacob19 SteveSamJacob19 added the infrastructure Work that enables/improves hosting the site and its content label Jul 25, 2024
@SteveSamJacob19 SteveSamJacob19 self-assigned this Jul 25, 2024
@github-project-automation github-project-automation bot moved this to New (Untriaged) in Website backlog Jul 25, 2024
@SteveSamJacob19
Copy link
Contributor Author

SteveSamJacob19 commented Jul 29, 2024

This issue was raised to the cd-cc team headed by kevin smith and unfortunately they are not familiar with our pipelines although they also suggested trying to increase the ephemeral storage, similar to my inference above and also asked me to raise a ticket to cloud support.

@SteveSamJacob19
Copy link
Contributor Author

After discussions with Kin, we have found out that the ephemeral storage is only used to start up the code engine and hence does not affect the storage of the worker node. I have raised a ticket to cloud support for further help

@SteveSamJacob19 SteveSamJacob19 moved this from New (Untriaged) to In Development in Website backlog Jul 30, 2024
@SteveSamJacob19
Copy link
Contributor Author

The cloud support team had asked me to increase the value of PVC in the ci-listener.yml file from 5Gi. I made it 10Gi, 15Gi and 30Gi but it still resulted the same error.

@SteveSamJacob19
Copy link
Contributor Author

I had a meeting with Olivier of the IBM cloud support team, who after careful inspection was able to find out that we have assigned a size value of 20G to the sidecar which gets used up during the docker build step, changing it to 50G solved the issue and the build is now successful.

@SteveSamJacob19 SteveSamJacob19 moved this from In Development to Closed/Done in Website backlog Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Work that enables/improves hosting the site and its content
Projects
Status: Closed/Done
Development

No branches or pull requests

1 participant