Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] DEV-1849: Add yaml linter, formatter and unix-script for replacing chars. #984

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

wrkhenddher
Copy link

@wrkhenddher wrkhenddher commented May 12, 2023

  • Add pre-commit hook yamlfmt as manual stage.
  • Add pre-commit hooks, replacer and yamllint, to the commit stage.

yamlfmt is already being used by gdcapi as a pre-commit hook

replacer is a custom shell script that changes occurrences of any en-hyphen with an ascii dash. This will remove the possibility of inadvertently using en-hyphen as list-item structural marker.

yamllint performs various extra structural and semantics checks.

The reason to leave yamlfmt in the manual stage is so that all files that are currently reporting warnings can be fixed gradually. Once they are all fixed, the hook can be permanently turned on by removing the stages: [manual] attribute.

To run yamlfmt on demand, use the following command:

$ pre-commit run --hook-stage manual yamlfmt --files <files...>

# E.g. 
# pre-commit run --hook-stage manual yamlfmt --files src/gdcdictionary/schemas/aligned_reads.yaml

Henddher Pedroza added 4 commits May 12, 2023 10:46
yamllint is now on always.
yamlfmt is only enabled during manual stage.
E.g.
  $ pre-commit run --hook-stage manual yamlfmt --files src/gdcdictionary/schemas/aligned_reads.yaml
The idea is to convert a few files at the time until they're all well formatted.
An script that sed all schema files and replaces unicode en-dash
“–” (U+2013) with ascii dash.
# TODO: Remove stages so it always run. Do it once all yamls are properly formatted.
stages: [manual]

- repo: https://github.com/adrienverge/yamllint.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding libraries, check the license. This code is GPL'd. We cannot use it.

We need more permissible licenses, such as MIT or Apache 2.0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even when we neither distribute the tool, nor modify it, nor include it in our distribution?

(wouldn't using this tool be equivalent to using any other gnu-tool in our build system?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a fair question and, to my knowledge, a grey area. as an interpreted language, python (mostly) does not have the concept of "linking". so, there's little distinction between a library and source code. are you aware of any legal precedents that have clarified this? if not, we should err on the side of caution and not use GPL'd python code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@stilesj-uchicago and I had the same understanding => "we're good in this case".

We're using yamlfmt (mit) in gdcapi but not yamllint (gplv3).

Maybe I can remove yamllint until we find an alternative?

Comment on lines +26 to +41
- repo: local
hooks:
- id: replacer
# More chars can be added
name: Replace unicode dashes with ascii dashes.
language: system
pass_filenames: false
entry: |
sh -c "
# echo -n '–' | hexdump -C # > e2 80 93
LC_ALL=C
for f in $(ls src/gdcdictionary/schemas/*.yaml); do
# echo 'Converting ' $f
sed -i '' 's/\xe2\x80\x93/-/g' $f
done
"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This isn't going to work for everyone, such as Windows users.
  2. If we are going to enforce ascii, we could use require-ascii. It's catch much more than just the hyphen. We'll need to exclude these files, as described in this comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This isn't going to work for everyone, such as Windows users.

Good point. Should it be a deal breaker?

  1. If we are going to enforce ascii, we could use require-ascii. It's catch much more than just the hyphen. We'll need to exclude these files, as described in this comment.

The catch is that unicode chars are allowed. Depending on the version of the yaml file (which we do NOT explicitly define) and the version of the parser (somehow loose atm), one should either quote them or do whatever applies.

When I first saw the failure and the traceback, I was inclined to remove the whole "ascii" enforcement and instead specify yaml versions and straighten up all the versions (yaml files and parser).

I think we should pursue that end. It might take some bit effort but as long as we can diff the current json schema with the new one and make sure the match, we should be ok. IMHO.

pass_filenames: false
entry: |
sh -c "
# echo -n '–' | hexdump -C # > e2 80 93
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great comment 👍

@wrkhenddher wrkhenddher changed the title DEV-1849: Fix unicode chars in schema files. DEV-1849: Add yaml linter, formatter and unix-script for replacing chars. May 17, 2023
@wrkhenddher wrkhenddher changed the title DEV-1849: Add yaml linter, formatter and unix-script for replacing chars. [WIP] DEV-1849: Add yaml linter, formatter and unix-script for replacing chars. May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants