Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: mtree_spec rule does not encode filenames with special characters #794

Open
pcj opened this issue Mar 19, 2024 · 7 comments · May be fixed by #795 or #985
Open

[Bug]: mtree_spec rule does not encode filenames with special characters #794

pcj opened this issue Mar 19, 2024 · 7 comments · May be fixed by #795 or #985
Labels
bug Something isn't working untriaged Requires traige

Comments

@pcj
Copy link

pcj commented Mar 19, 2024

What happened?

The mtree_spec rule generates an invalid mtree entries for filenames with special characters. For example:

server.runfiles/pypi/_lock/[email protected]/site-packages/setuptools/command/launcher manifest.xml uid=0 gid=0 time=1672560000 mode=0755 type=file content=bazel-out/darwin_arm64-fastbuild-ST-5b66dcbb8b68/bin/external/pypi/_lock/[email protected]/site-packages/setuptools/command/launcher manifest.xml

Causes:

bsdtar: Error reading archive bazel-out/darwin_arm64-fastbuild/bin/server/server_image.packages_tar_manifest.spec: Can't open bazel-out/darwin_arm64-fastbuild-ST-5b66dcbb8b68/bin/external/pypi/_lock/[email protected]/site-packages/setuptools/command/launcher

Filenames should be encoded according to vis (see man vis)

In this case it should be encoded as:

server.runfiles/pypi/_lock/[email protected]/site-packages/setuptools/command/launcher\040manifest.xml uid=0 gid=0 time=1672560000 mode=0755 type=file content=bazel-out/darwin_arm64-fastbuild-ST-5b66dcbb8b68/bin/external/pypi/_lock/[email protected]/site-packages/setuptools/command/launcher\040manifest.xml

References:

Version

http_archive(
  name = "aspect_bazel_lib",
  generator_name = "aspect_bazel_lib",
  generator_function = "fetch_dependencies",
  urls = ["https://github.com/aspect-build/bazel-lib/archive/40948111707ff0f491d14ea4bd016b78498d14cf.tar.gz"],
  sha256 = "b69d39dd191bcd6f4ac6d62b0124ea460caf530bb18ea678c9778fe4046c21f9",
  strip_prefix = "bazel-lib-40948111707ff0f491d14ea4bd016b78498d14cf",
)

How to reproduce

No response

Any other information?

No response

@pcj pcj added the bug Something isn't working label Mar 19, 2024
@github-actions github-actions bot added the untriaged Requires traige label Mar 19, 2024
@pcj pcj changed the title [Bug]: mtree_spec function does not encode filenames with special characters [Bug]: mtree_spec rule does not encode filenames with special characters Mar 19, 2024
@pcj
Copy link
Author

pcj commented Mar 19, 2024

Hmm I started a PR on this and am realizing that the use of bsdtar and mtree with bazel+starlark is complicated.

The mtree specification does not allow unicode characters for filenames, and I don't see a way in java starlark to get the codepoint of a string character (the ord function is missing). Starlark therefore is not the right tool for producing an vis-encoded mtree text file (you can't create the necessary escape sequence for a unicode-encoded file like 😍.txt).

The mtree_spec rule could use a dedicated tool (like go-mtree, or bsdtar itself) but then you have a runtime dependency on rules_go, or add that to your build and release workflows.

In any case, that's a much bigger PR. I think I will have to move away from using this tar rule. Unless I'm missing something, it can't be used for anything other than toy examples without a bit of a redesign.

@alexeagle
Copy link
Collaborator

We already had to add ord for a base64 implementation in starlark
https://github.com/aspect-build/bazel-lib/blob/0fc838839c41677b18245152ad50759233c53794/lib/private/strings.bzl#L521

So I think that path of fixing the mtree we write following the spec is viable

@alexeagle
Copy link
Collaborator

add that to your build and release workflows

If we did go that route - we already have release automation to build four other Go utilities and expose them as pre-built binary toolchains for users, so that's totally viable as well

@pcj
Copy link
Author

pcj commented Mar 20, 2024

We already had to add ord for a base64 implementation in starlark

Hmm I looked at it and this implementation only works for ascii, would not work for 😍. Need to be able to do something similar to:

echo 😍 | vis -o
\237\230\215

I don't think this is possible in starlark... you can't iterate the individual bytes of a unicode string.

@pcj
Copy link
Author

pcj commented Mar 20, 2024

Interesting, I didn't know git uses something like this too?

$ find find e2e/smoke/testdata
e2e/smoke/testdata/file empty with spaces.txt
e2e/smoke/testdata/file empty with 😍.txt
e2e/smoke/testdata/file-empty.txt

Prepping to commit these files:

$ git status

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.vscode/
	e2e/smoke/.vscode/
	"e2e/smoke/testdata/file empty with \360\237\230\215.txt"

@pcj pcj linked a pull request Mar 20, 2024 that will close this issue
@pcj
Copy link
Author

pcj commented Mar 20, 2024

@alexeagle I pushed #795 if you want to continue working on it, or feel free to close the PR and start a new one, I don't need the commit attribution.

@betaboon
Copy link

betaboon commented May 5, 2024

hello.
Until we found a good solution in #795 to vendor vis for a proper solution I'm using this patch against bazel-lib to address the whitespace case:

diff --git a/lib/private/tar.bzl b/lib/private/tar.bzl
index 0bc8a2e..db064c1 100644
--- a/lib/private/tar.bzl
+++ b/lib/private/tar.bzl
@@ -161,16 +161,17 @@ def _tar_impl(ctx):
     return DefaultInfo(files = depset([out]), runfiles = ctx.runfiles([out]))

 def _mtree_line(file, type, content = None, uid = "0", gid = "0", time = "1672560000", mode = "0755"):
     spec = [
-        file,
+        file.replace(" ", "\\040"),
         "uid=" + uid,
         "gid=" + gid,
         "time=" + time,
         "mode=" + mode,
         "type=" + type,
     ]
     if content:
+        content = content.replace(" ", "\\040")
         spec.append("content=" + content)
     return " ".join(spec)

 # This function exactly same as the one from "@aspect_bazel_lib//lib:paths.bzl"

via:

    http_archive(
        name = "aspect_bazel_lib",
        sha256 = "a8a92645e7298bbf538aa880131c6adb4cf6239bbd27230f077a00414d58e4ce",
        strip_prefix = "bazel-lib-2.7.2",
        url = "https://github.com/aspect-build/bazel-lib/releases/download/v2.7.2/bazel-lib-v2.7.2.tar.gz",
        patches = [
            # workaround for https://github.com/aspect-build/bazel-lib/issues/794
            "//:[email protected]_spec.patch",
        ],
        patch_args = ["-p1"],
    )

alexeagle added a commit that referenced this issue May 7, 2024
This is a partial solution to #794 that covers the most common case needing escaping.
See #795 for a principled but blocked fix that requires a BSD vis executable
alexeagle added a commit that referenced this issue May 7, 2024
This is a partial solution to #794 that covers the most common case needing escaping.
See #795 for a principled but blocked fix that requires a BSD vis executable
alexeagle added a commit that referenced this issue May 7, 2024
This is a partial solution to #794 that covers the most common case needing escaping.
See #795 for a principled but blocked fix that requires a BSD vis executable
plobsing added a commit to plobsing/bazel-lib that referenced this issue Nov 22, 2024
Bazel's Starlark does not provide access to a string's bytes, only its
codepoints, so we are unable to do this escaping in Starlark. So a
second pass is needed, at least until the spec and implementation work
to get a [`bytes` type](bazelbuild/starlark#112) lands.

Fixes bazel-contrib#794
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged Requires traige
Projects
None yet
3 participants