-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow: Publish Data #980
base: main
Are you sure you want to change the base?
Conversation
Basic separation of input/ouput: - $1 and $2 are now static because the workflow is run in an isolated environment and the cwd is always the repo root. - Set dynamic inputs in the workflow as env variables. - Upload built files as a workflow artifact. Git doesn’t preserve complete file permissions, and the zipping/unzipping process doesn’t reliably preserve file permissions. Therefore file permissions need to be normalized at deployment site. The “Copy files from elsewhere” note needs to be documented in data-workflow.md#publication.
Corrected how the dist directory is referenced when moving UCD.zip. UNITOOLS_DATA and DRAFT are no longer necessary. “dist” as a conventional target direction is easier to read than a variable. No need to remove old ZIP file because a workflow runs in an isolated environment.
Use modern syntax $(…) in place of `…`. Quote $MODE in comparison to avoid the confusing “unary operator expected” error when value is empty in testing. Copy directories directly (rather copy content of a directory to a ready-made directory) whenever possible, for simpler code.
Looks promising! Some thoughts:
That could be messy, because the /Public/draft/ folder structure (all files for a version under one root) differs from that of the final structure (different roots by type eg UCD vs. idna, each with version subfolders).
I have an aversion against "polluting the source tree". Could we copy into ../pub or ../dist or similar?
Probably.
Sure. Maybe a shared .sh script that gets included/imported?
I think it would be nice/useful if it was still possible to run the publication script(s) from a local command line. I haven't looked into the nektos/act thing, but since we are mostly running a shell script, it should be easy to still invoke that directly. It would be ok if that meant having to define/export env vars rather than using command line args.
Ever since I added the chmod's to my scripts, I believe that @rickmcgowan has been able to unzip my file drops on the server and didn't need to fix permissions again. Unless he chimes in and says otherwise. @glechner9147 FYI
It is. I just thought it was a useful reminder in the output of the manually-run script. workflow_dispatch options: s/Snapshot/UCD/ We could also consider a list of things to include: emoji, idna, uca, ... |
Please un-delete the old scripts for now, so that we can try out the new workflow while we still have the old scripts for backup. |
Just some conditional paths. Shouldn’t make the .sh file much more complicated. But anyway, we have more than half an year to experiment so I’m not in a hurry to implement that now.
Well you guys already have a massive https://github.com/unicode-org/unicodetools/blob/main/.gitignore file that offers numerous ignored directories. Writing to the filesystem outside the current repo is a worse pollution because it basically breaches a sandbox. But again, I’m not familiar with how you guys use this repo so I’ll leave it to you to decide.
One convention is to have a source controlled .env file.
It is already possible. The external dependencies are those env vars under On the other hand, I split those env vars to the .yml file only because I consider them to be the dynamic parameters of the workflow, while the .sh file is the stable implementation. You can move them back to the .sh file too, which will be more self-contained.
That usage of nektos/act is only for high-fidelity simulation of the GitHub workflow environment (it connects to a Docker instance to actually run the workflow in an isolated environment). For example, without nektos/act, the .sh file’s way of using the date and sed commands isn’t compatible with my platform (macOS).
Likely that’s because what we need on the server is simply 755/644, and that’s the default you get when unzipping files. My point is, ZIP files can’t be used to reliably preserve unusual permission settings. If you actually need to control permissions before zipping a bunch of files, you need to either carefully unzip them with some specific command (which is a fragile process when it’s done by a human), or you need to first make a .tar.
“UCD” is also confusing here. Does it make sense to actually call it “Pre-Alpha” or something? It’s really the release phase that matters. Also now I realized that the whole workflow (including the filenames) shouldn’t be called “Publish UCD”, but should be something like “Publish Data”.
I can easily refactor to that direction as long as you guys give me an exact list. Otherwise, feel free to make the changes directly. There’s some examples of the supported input types: https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#providing-inputs. |
This is gonna make the change history funny, but Markus requested this: unicode-org#980 (comment)
It doesn’t breach anything so long as you control the subtree you work in; it is just a matter of cloning into a subdirectory, as we do in other workflows: unicodetools/.github/workflows/cli-build-instructions.yml Lines 101 to 109 in f731c41
This is a pretty typical « out-of-source » build setup. |
We use unicodetools/unicodetools/src/main/java/org/unicode/text/utility/Settings.java Lines 28 to 30 in f731c41
(I have an issue to add a GAMMA, #936.) This also shows up in the JSPs, with draft properties being accessible with the |
My goal is to propose simple, straightforward code to get the work done. I write minimal code without “good to have” features because it’s easier to review and easier to maintain. But after merging, it’ll be in PAG’s hands for maintenance, and by all means you guys can complicate it to your taste.
I’ll change “Snapshot” to “Dev”. Also I’ve realized that until multiple files start to share some state like the |
-I could be used in place of --iso-8601, but we also need +%Y anyway.
Among other changes in the recent commits, I changed the output directory from “dist” to “pub/tmp”, so now it’s in a dedicated and .gitignore-d directory: 03710ab. |
There is a broader maintenance issue here, which is that the yearly exercise of #931 is getting steadily messier with a dozen files in various languages independently carrying the current version, but I agree that this is far out of scope for this PR. I should probably do something about it, since this also ties into the γ phase thing.
The reviewers here, who are maintainers of this codebase, are also striving for ease of review and maintenance, rather than tasteful complication. |
.github/workflows/publish-data.sh
Outdated
mv $TMP/UCD/ucd/version-ReadMe.txt $TMP/UCD/ReadMe.txt | ||
rm -r $TMP/UCD/ucd/Unihan | ||
|
||
if [ "$RELEASE_PHASE" = "Dev" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would normally go with bash conditional expressions (with [[
]]
) rather than the test
command (aka [
).
@markusicu, I am unsure about our bash style here, which one do we prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn’t familiar with the difference between [
and [[
, but now after some reading I do prefer [[
as it offers an experience similar to modern languages. Made the change.
The pub directory is not included in the sparse checkout, and the nektos/act tool for testing locally doesn’t do an actual sparse checkout (it does `docker cp` instead) so I didn’t catch this issue locally.
See the fork repo for test runs: https://github.com/lianghai/unicodetools/actions/workflows/publish-data.yml.
See commit messages for rationale behind the changes. The branch name “publish-ucd” is a misnomer, as this workflow covers more than UCD.
To do:
workflow_dispatch
interface’s “Use workflow from” dropdown already lets one to run from a selected ref (branch or tag). It only defaults to the main branch.date
together with PUB_DATE.date
compatible with macOS, for convenience of developmentFuture considerations:
sed
compatible with macOS, for convenience of developmentConsidered:
Can UNI_VER and EMOJI_VER be stored in a central location in the repo?unicodetools/unicodetools/src/main/java/org/unicode/text/utility/Settings.java
Line 25 in f731c41
GitHub releases?