-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Replace main.py
with Typer-based CLI app
#63
Conversation
- Fixed all imports - Fixed test and coverage settings in pyproject.toml - Removed all Python path magic in __init.py__ files - Moved data files into the repo, and used `importlib` to load files by package name instead of path. This is more portable, especially once we turn this into a distributable package. - Refactored global_data to only load data once at module load time
- Move file format settings from config.py into respective data transform scripts - Move src/dest file settings from config.py to CLI args - Use consistent CLI arg and file exists handling - Add openpyxl dependency for handling NAICS Excel reading
- Use required enum-based `severity` arg over boolean `warning` with default to false. This default is likely partially the cause of several warning-level validations being set to error. - Remove `test_checks.py` as those tests are no longer needed with refactored `SBLCheck`. - Refactor JSON output to use `severity` enum value - Refactor exception handling for Pandera's `SchemaErrors`
…into add-typer-cli Conflicts: poetry.lock pyproject.toml regtech_data_validator/create_schemas.py regtech_data_validator/main.py regtech_data_validator/phase_validations.py tests/test_sample_data.py
Coverage reportThe coverage rate went from
Diff Coverage details (click to unfold)regtech_data_validator/cli.py
regtech_data_validator/create_schemas.py
regtech_data_validator/phase_validations.py
|
When you only have a single sub-command, `validate` in this case, Typer seems to automatically move that functionality to the top-level command. To get around that strangeness, I've re-added the `describe` command that's intended to print out the file format and the validation list, but isn't implemented yet. For now if you call it, it simply reports _Feature coming soon..._ We'll implement this for real in a follow-up PR sometime soon.
Also making a note that devcontainer and native poetry doesn't seem to play nice together. When I ran the code in my local terminal outside of devcontainer, poetry shell wasn't working properly until I did a |
Spinning this off into follow-up issue #64. |
@lchen-2101, can you take another peek? I fixed up the bits you mentioned, and I also split up the CLI's output handling into separate functions for easier testing. I've also added tests for all that. The coverage rate is now back up over 85%. |
Co-authored-by: lchen-2101 <[email protected]>
…validator into add-typer-cli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### Replace `main.py` to [Typer](https://typer.tiangolo.com/) CLI app The CLI setup in `main.py` is pretty primitive, and is really geared towards _us_, developers on this project. I would like to create a utility that outside technical users can use to test their files against our validation logic. To get there without having to reinvent many wheels, we need an actual CLI framework. That's where [Typer](https://typer.tiangolo.com/) comes in. Typer is a wrapper around [Click](https://click.palletsprojects.com/en/8.1.x/), written by the same dev as FastAPI, so it follows many of the same patterns as we're used to there, such as leaning into Python's `typing` and [Pydantic](https://docs.pydantic.dev/latest/) for example. ### Return validation results as Pandas `DataFrame` instead of vanilla `dict` I also wanted to support multiple output formats; at least json, csv, and default to a human-readable table. And while we _could_ do that with a `dict`, it's much easier to work with `DataFrame`s as they're already tabular in nature and have a ton of transform functionality built-in. So, I have also refactored the `validate` function to return a Panda's `DataFrame` instead of a vanilla `dict`. This made it much easier to output validation data in multiple formats. Along with this, I did a fair bit of refactoring of that surrounding code, breaking up the big monolith function into smaller testable functions. ### `cfpb-val` script To simply using the CLI, I've added a shortcut script so you don't have to call the full path to the Python script. In it's current form, you can now call just... cfpb-val --help ...instead of... python regtech_data_validator/cli.py --help #### Better script name? I'm not in love with `cfpb-val`. I started off with `cfpb-comply`, but I didn't love that either. I'm looking for something that's short, but also very clear that it's CFPB-related, and having to do with this project. `cfpb-rdv`? ### Lots of `README.md` love I've reworked the `README.md` quite a bit, focusing on users coming to this repo for the first time, and trying to get the CLI to work. This includes pushing the Poetry-based install up to the top and simplifying it, slimming and pushing down development-related setup, and adding details about how to use the CLI itself. See: https://github.com/cfpb/regtech-data-validator/blob/add-typer-cli/README.md ### Refactor `lei` arg to more generic `context` In attempt to future-proof this a bit, I've added a multi-value `context` arg to the CLI, which allows you to pass arbitrary `<key>=<value>` pairs, `lei` being the only key that's current used for anything. ### Other tidbits I found along the way - Removed deprecated `python.formatting.provider` VS Code configuration - Removes no longer used `Makefile` - Fixes several validations that had incorrect IDs, severities, or were in the wrong phase --------- Co-authored-by: lchen-2101 <[email protected]>
Replace
main.py
to Typer CLI appThe CLI setup in
main.py
is pretty primitive, and is really geared towards us, developers on this project. I would like to create a utility that outside technical users can use to test their files against our validation logic. To get there without having to reinvent many wheels, we need an actual CLI framework. That's where Typer comes in.Typer is a wrapper around Click, written by the same dev as FastAPI, so it follows many of the same patterns as we're used to there, such as leaning into Python's
typing
and Pydantic for example.Return validation results as Pandas
DataFrame
instead of vanilladict
I also wanted to support multiple output formats; at least json, csv, and default to a human-readable table. And while we could do that with a
dict
, it's much easier to work withDataFrame
s as they're already tabular in nature and have a ton of transform functionality built-in. So, I have also refactored thevalidate
function to return a Panda'sDataFrame
instead of a vanilladict
. This made it much easier to output validation data in multiple formats.Along with this, I did a fair bit of refactoring of that surrounding code, breaking up the big monolith function into smaller testable functions.
cfpb-val
scriptTo simply using the CLI, I've added a shortcut script so you don't have to call the full path to the Python script. In it's current form, you can now call just...
...instead of...
Better script name?
I'm not in love with
cfpb-val
. I started off withcfpb-comply
, but I didn't love that either. I'm looking for something that's short, but also very clear that it's CFPB-related, and having to do with this project.cfpb-rdv
?Lots of
README.md
loveI've reworked the
README.md
quite a bit, focusing on users coming to this repo for the first time, and trying to get the CLI to work. This includes pushing the Poetry-based install up to the top and simplifying it, slimming and pushing down development-related setup, and adding details about how to use the CLI itself.See: https://github.com/cfpb/regtech-data-validator/blob/add-typer-cli/README.md
Refactor
lei
arg to more genericcontext
In attempt to future-proof this a bit, I've added a multi-value
context
arg to the CLI, which allows you to pass arbitrary<key>=<value>
pairs,lei
being the only key that's current used for anything.Other tidbits I found along the way
python.formatting.provider
VS Code configurationMakefile