-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] DEV-1849: Add yaml linter, formatter and unix-script for replacing chars. #984
base: develop
Are you sure you want to change the base?
Conversation
yamllint is now on always. yamlfmt is only enabled during manual stage. E.g. $ pre-commit run --hook-stage manual yamlfmt --files src/gdcdictionary/schemas/aligned_reads.yaml The idea is to convert a few files at the time until they're all well formatted.
An script that sed all schema files and replaces unicode en-dash “–” (U+2013) with ascii dash.
# TODO: Remove stages so it always run. Do it once all yamls are properly formatted. | ||
stages: [manual] | ||
|
||
- repo: https://github.com/adrienverge/yamllint.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding libraries, check the license. This code is GPL'd. We cannot use it.
We need more permissible licenses, such as MIT or Apache 2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even when we neither distribute the tool, nor modify it, nor include it in our distribution?
(wouldn't using this tool be equivalent to using any other gnu-tool in our build system?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a fair question and, to my knowledge, a grey area. as an interpreted language, python (mostly) does not have the concept of "linking". so, there's little distinction between a library and source code. are you aware of any legal precedents that have clarified this? if not, we should err on the side of caution and not use GPL'd python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
@stilesj-uchicago and I had the same understanding => "we're good in this case".
We're using yamlfmt (mit) in gdcapi but not yamllint (gplv3).
Maybe I can remove yamllint until we find an alternative?
- repo: local | ||
hooks: | ||
- id: replacer | ||
# More chars can be added | ||
name: Replace unicode dashes with ascii dashes. | ||
language: system | ||
pass_filenames: false | ||
entry: | | ||
sh -c " | ||
# echo -n '–' | hexdump -C # > e2 80 93 | ||
LC_ALL=C | ||
for f in $(ls src/gdcdictionary/schemas/*.yaml); do | ||
# echo 'Converting ' $f | ||
sed -i '' 's/\xe2\x80\x93/-/g' $f | ||
done | ||
" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This isn't going to work for everyone, such as Windows users.
- If we are going to enforce ascii, we could use require-ascii. It's catch much more than just the hyphen. We'll need to exclude these files, as described in this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This isn't going to work for everyone, such as Windows users.
Good point. Should it be a deal breaker?
- If we are going to enforce ascii, we could use require-ascii. It's catch much more than just the hyphen. We'll need to exclude these files, as described in this comment.
The catch is that unicode chars are allowed. Depending on the version of the yaml file (which we do NOT explicitly define) and the version of the parser (somehow loose atm), one should either quote them or do whatever applies.
When I first saw the failure and the traceback, I was inclined to remove the whole "ascii" enforcement and instead specify yaml versions and straighten up all the versions (yaml files and parser).
I think we should pursue that end. It might take some bit effort but as long as we can diff the current json schema with the new one and make sure the match, we should be ok. IMHO.
pass_filenames: false | ||
entry: | | ||
sh -c " | ||
# echo -n '–' | hexdump -C # > e2 80 93 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great comment 👍
According to [tox docs], the exit code of any command preceeded by a dash is ignored. [tox docs]: https://tox.wiki/en/latest/config.html#commands
yamlfmt
as manual stage.replacer
andyamllint
, to the commit stage.yamlfmt
is already being used by gdcapi as a pre-commit hookreplacer
is a custom shell script that changes occurrences of any en-hyphen with an ascii dash. This will remove the possibility of inadvertently using en-hyphen as list-item structural marker.yamllint
performs various extra structural and semantics checks.The reason to leave
yamlfmt
in the manual stage is so that all files that are currently reporting warnings can be fixed gradually. Once they are all fixed, the hook can be permanently turned on by removing thestages: [manual]
attribute.To run
yamlfmt
on demand, use the following command: