Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed cleanup of temporary files from tasks #1142

Merged
merged 1 commit into from
Nov 4, 2024

Conversation

hstct
Copy link
Contributor

@hstct hstct commented Aug 26, 2024

closes #1141

@hstct hstct force-pushed the cleanup_tmp_files branch 3 times, most recently from 2d062e6 to 548d306 Compare August 28, 2024 12:20
@hstct hstct marked this pull request as ready for review September 3, 2024 09:45
@@ -92,6 +96,10 @@ def validate(self):
"service script, must end with the 'InRelease' file name!"
)
raise RuntimeError(message)
files_to_cleanup.append(inline_path) # Track file for cleanup
directories_to_cleanup.append(
os.path.dirname(inline_path)
Copy link

@sbernhard sbernhard Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we really clean up the directory AND the file?
would it be better to create allt the temp files one specific temp directory and then ONLY remove this temp directory?

I don't know what the os.path.dirname(inline_path) directory is but if it would be /tmp or /var/tmp we are ruined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each task will create their own tmp.{task-id-slug} directory in /tmp/. If I don't clean them up than you end up with a lot of empty directories.

Changing it that they all get created into a specific directory seems more involved.

The inline_path gets taken from the input:

        {
          "signatures": {
            "inline": "<relative_path>/InRelease",
            "detached": "<relative_path>/Release.gpg",
          }
        }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have one directory for each task, how about saving the tmp directories path for each task and after the task finished (or an exception occurs), remove the directory completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work like this now? It doesn't use the pulpcore task but creates its own temp directory. But I'm not sure I follow exactly what you mean. It's better to just delete it after the validation is done because we won't need it anymore.

@mdellweg
Copy link
Member

To my knowledge, each task is performed in a temporary directory:
https://github.com/pulp/pulpcore/blob/main/pulpcore/tasking/_util.py#L126

Is that mechanism somehow not working?

@hstct
Copy link
Contributor Author

hstct commented Sep 19, 2024

To my knowledge, each task is performed in a temporary directory: https://github.com/pulp/pulpcore/blob/main/pulpcore/tasking/_util.py#L126

Is that mechanism somehow not working?

Not sure but the tmp dirs that are created for the signing service stuff look like this:

[root@1664e82d3482 tmp]# tree tmp*
tmp.1Yg9JNMlfs
├── InRelease
└── Release.gpg
tmp.4CDioeKNzT
├── InRelease
└── Release.gpg
tmp.5w3Hljn54v
├── InRelease
└── Release.gpg
tmp.6oIwBihpBI
├── InRelease
└── Release.gpg

@hstct hstct force-pushed the cleanup_tmp_files branch 3 times, most recently from 2123727 to 7ec775f Compare September 19, 2024 15:18
@mdellweg
Copy link
Member

I do not see any new temporary directory being created here:
https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/content.py#L788

So I assume at this point the process should "be" in the temporary directory of the task.
Does the signing script itself create these new directories?

@quba42
Copy link
Collaborator

quba42 commented Sep 23, 2024

Does the signing script itself create these new directories?

I think so. The signing service script we use on production instances has:

OUTPUT_DIR="$(/usr/bin/mktemp -d)"
DETACHED_SIGNATURE_PATH="${OUTPUT_DIR}/Release.gpg"
INLINE_SIGNATURE_PATH="${OUTPUT_DIR}/InRelease"

It then passes the paths of the created files back to Pulp via the return value. I presume pulp_deb then turns those files into publication artifacts, and once that is done, it should cleanup the files (as well as the containing temp directory)?

@hstct
Copy link
Contributor Author

hstct commented Sep 23, 2024

Yes the signing script does create tmpfiles this was an issue with the files created by the tests. The workaround here is that I added a log for the signing script that will be used during the cleanup in the signing service fixture

@mdellweg
Copy link
Member

I'd prefer if we can teach the signing script to write files in the current directory that then is already a temporary directory.
From the perspective of the script itself, it's not actually writing temporary files but output files in the first place they should only be considered temporary in the context of the task code.

@sbernhard
Copy link

I'd prefer if we can teach the signing script to write files in the current directory that then is already a temporary directory. From the perspective of the script itself, it's not actually writing temporary files but output files in the first place they should only be considered temporary in the context of the task code.

Agree, or the signing service should be responsible to cleanup temp files itself.

bastian-src

This comment was marked as off-topic.

@quba42
Copy link
Collaborator

quba42 commented Sep 24, 2024

I'd prefer if we can teach the signing script to write files in the current directory that then is already a temporary directory.

Would this imply teaching pulpcore to send in an extra parameter to the script? Does [https://github.com/pulp/pulpcore/blob/6a2d120a10b7a696a1282087eb51c5112cbb8351/pulpcore/app/models/content.py#L788](the pulpcore sign method) even have a way of knowing it is being called from a task? How can it receive the temp folder path?

Agree, or the signing service should be responsible to cleanup temp files itself.

The signing script returns what files it created as the return value. I guess manually cleaning those up, is what this PR is trying to do. I don't think the signing script could be made to clean up behind itself, since it has no way of knowing when Pulp is done with the files.

@mdellweg
Copy link
Member

I'd prefer if we can teach the signing script to write files in the current directory that then is already a temporary directory.

Would this imply teaching pulpcore to send in an extra parameter to the script? Does [https://github.com/pulp/pulpcore/blob/6a2d120a10b7a696a1282087eb51c5112cbb8351/pulpcore/app/models/content.py#L788](the pulpcore sign method) even have a way of knowing it is being called from a task? How can it receive the temp folder path?

As far as i see, this script should only be called in two circumstances:

  1. When validating a new singing service (see [0,1])
  2. When being called from a task

In both cases, I expect the CWD (current working directory) to be a temporary one. If that is not true, that's the thing we should seek to fix.

[0] https://github.com/pulp/pulpcore/blob/main/pulpcore/app/models/content.py#L903
[1] https://github.com/pulp/pulp_deb/blob/main/pulp_deb/app/models/signing_service.py#L35

@quba42
Copy link
Collaborator

quba42 commented Sep 25, 2024

@mdellweg I will have a go at testing a signing service script that just creates the file in the working directory.

@hstct
Copy link
Contributor Author

hstct commented Oct 1, 2024

As far as i see, this script should only be called in two circumstances:

  • When validating a new singing service (see [0,1])
  • When being called from a task

In both cases, I expect the CWD (current working directory) to be a temporary one. If that is not true, that's the thing we should seek to fix.

The cwd is always the root of the project never the tmp directory. So @quba42 suggestion to expand the the SigningService.sign method with another parameter seems a proper way to fix this than you can just tell it where to create the tmp files.

Still pondering if there might be another solution that doesn't involve a change in pulpcore.

@mdellweg
Copy link
Member

mdellweg commented Oct 1, 2024

Still pondering if there might be another solution that doesn't involve a change in pulpcore.

Honestly, at this point, i think it is a bug in pulpcore and should be fixed there. It must be possible to call the signing script inside of a specific directory.
Maybe the issue is that with TemporaryDirectory: does not do the os.chdir for us.
Or os.subprocess needs to be fed the cwd explicitely.

@hstct
Copy link
Contributor Author

hstct commented Oct 22, 2024

This PR now requires this pulpcore PR pulp/pulpcore#5924

What changed is that now the sign and asign methods allow for a an optional output path which the signing script then uses. Here we let the publish create a directory inside that temp folder and this now appears to solve the issue where the content scope did not clean it up afterwards.

The sync task still unzips the Package archive from upstream and saves it in the /tmp/ directory as well. This one needs manual cleanup since the file gets used out of content scope.

With the changes all relevant tmp files should now properly be cleaned up.

@hstct
Copy link
Contributor Author

hstct commented Oct 23, 2024

Thanks to @mdellweg I was able to achieve the same thing without relying on a pulpcore by using environment variables change see: pulp/pulpcore#5924 (comment)

@mdellweg
Copy link
Member

So i did some experimentation. And indeed, i am pretty certain, that the script inherits the cwd of the task, which is beneath the tasks workers working directory and so we already have two levels of cleanup for it.
However, i think the PWD environment variable may not be set, but calling "pwd" should have the desired result (same as ls -d /proc/self/cwd). Can you confirm that?
I believe this whole issue should be able to be solved solely inside the signing script when we stop trying to do something clever with extra temporary files and start just using the current directory.

@hstct
Copy link
Contributor Author

hstct commented Oct 28, 2024

@mdellweg I'm not sure, really. If I call pwd in the script it will also try to use the project root e.g. /src/pulp_deb/ in the oci-env. Unless I'm forgetting something?

- Add temp dir to env variables for signing service.
- Ensure proper cleanup of uncompressed files in synchronization tasks.

closes pulp#1141
Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one minor suggestion. It is a take it or leave it kind of thing, so I am not waiting with the ACK.

pulp_deb/app/tasks/publishing.py Show resolved Hide resolved
@hstct hstct merged commit eb5c455 into pulp:main Nov 4, 2024
12 checks passed
@hstct hstct deleted the cleanup_tmp_files branch November 4, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean tmp files after tasks are done
5 participants