Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Replace main.py with Typer-based CLI app #63

Merged
merged 35 commits into from
Nov 8, 2023
Merged

Conversation

hkeeler
Copy link
Member

@hkeeler hkeeler commented Nov 2, 2023

Replace main.py to Typer CLI app

The CLI setup in main.py is pretty primitive, and is really geared towards us, developers on this project. I would like to create a utility that outside technical users can use to test their files against our validation logic. To get there without having to reinvent many wheels, we need an actual CLI framework. That's where Typer comes in.

Typer is a wrapper around Click, written by the same dev as FastAPI, so it follows many of the same patterns as we're used to there, such as leaning into Python's typing and Pydantic for example.

Return validation results as Pandas DataFrame instead of vanilla dict

I also wanted to support multiple output formats; at least json, csv, and default to a human-readable table. And while we could do that with a dict, it's much easier to work with DataFrames as they're already tabular in nature and have a ton of transform functionality built-in. So, I have also refactored the validate function to return a Panda's DataFrame instead of a vanilla dict. This made it much easier to output validation data in multiple formats.

Along with this, I did a fair bit of refactoring of that surrounding code, breaking up the big monolith function into smaller testable functions.

cfpb-val script

To simply using the CLI, I've added a shortcut script so you don't have to call the full path to the Python script. In it's current form, you can now call just...

cfpb-val --help

...instead of...

python regtech_data_validator/cli.py --help 

Better script name?

I'm not in love with cfpb-val. I started off with cfpb-comply, but I didn't love that either. I'm looking for something that's short, but also very clear that it's CFPB-related, and having to do with this project. cfpb-rdv?

Lots of README.md love

I've reworked the README.md quite a bit, focusing on users coming to this repo for the first time, and trying to get the CLI to work. This includes pushing the Poetry-based install up to the top and simplifying it, slimming and pushing down development-related setup, and adding details about how to use the CLI itself.

See: https://github.com/cfpb/regtech-data-validator/blob/add-typer-cli/README.md

Refactor lei arg to more generic context

In attempt to future-proof this a bit, I've added a multi-value context arg to the CLI, which allows you to pass arbitrary <key>=<value> pairs, lei being the only key that's current used for anything.

Other tidbits I found along the way

  • Removed deprecated python.formatting.provider VS Code configuration
  • Removes no longer used Makefile
  • Fixes several validations that had incorrect IDs, severities, or were in the wrong phase

- Fixed all imports
- Fixed test and coverage settings in pyproject.toml
- Removed all Python path magic in __init.py__ files
- Moved data files into the repo, and used `importlib` to load files by
  package name instead of path. This is more portable, especially once
  we turn this into a distributable package.
- Refactored global_data to only load data once at module load time
- Move file format settings from config.py into respective data
  transform scripts
- Move src/dest file settings from config.py to CLI args
- Use consistent CLI arg and file exists handling
- Add openpyxl dependency for handling NAICS Excel reading
- Use required enum-based `severity` arg over boolean `warning`
  with default to false. This default is likely partially the cause of
  several warning-level validations being  set to error.
- Remove `test_checks.py` as those tests are no longer needed with
  refactored `SBLCheck`.
- Refactor JSON output to use `severity` enum value
- Refactor exception handling for Pandera's `SchemaErrors`
…into add-typer-cli

Conflicts:
	poetry.lock
	pyproject.toml
	regtech_data_validator/create_schemas.py
	regtech_data_validator/main.py
	regtech_data_validator/phase_validations.py
	tests/test_sample_data.py
Copy link

github-actions bot commented Nov 2, 2023

Coverage report

The coverage rate went from 85.13% to 92.94% ⬆️
The branch rate is 89%.

96.8% of new lines are covered.

Diff Coverage details (click to unfold)

regtech_data_validator/cli.py

96.05% of new lines are covered (95.28% of the complete file).
Missing lines: 145, 146, 155

regtech_data_validator/create_schemas.py

97.87% of new lines are covered (92.4% of the complete file).
Missing lines: 149

regtech_data_validator/phase_validations.py

100% of new lines are covered (100% of the complete file).

When you only have a single sub-command, `validate` in this case,
Typer seems to automatically move that functionality to the top-level
command. To get around that strangeness, I've re-added the `describe`
command that's intended to print out the file format and the validation
list, but isn't implemented yet. For now if you call it, it simply
reports _Feature coming soon..._

We'll implement this for real in a follow-up PR sometime soon.
README.md Outdated Show resolved Hide resolved
@lchen-2101
Copy link
Collaborator

Also making a note that devcontainer and native poetry doesn't seem to play nice together. When I ran the code in my local terminal outside of devcontainer, poetry shell wasn't working properly until I did a poetry install to get all the dependencies sync'ed up again; but that seems to break the dependencies for the devcontainer which then requires a poetry install in the vs code shell running in devcontainer.

@hkeeler
Copy link
Member Author

hkeeler commented Nov 7, 2023

Also making a note that devcontainer and native poetry doesn't seem to play nice together. When I ran the code in my local terminal outside of devcontainer, poetry shell wasn't working properly until I did a poetry install to get all the dependencies sync'ed up again; but that seems to break the dependencies for the devcontainer which then requires a poetry install in the vs code shell running in devcontainer.

Spinning this off into follow-up issue #64.

@hkeeler
Copy link
Member Author

hkeeler commented Nov 7, 2023

@lchen-2101, can you take another peek? I fixed up the bits you mentioned, and I also split up the CLI's output handling into separate functions for easier testing. I've also added tests for all that. The coverage rate is now back up over 85%.

Copy link
Collaborator

@lchen-2101 lchen-2101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lchen-2101 lchen-2101 merged commit 3384ef2 into main Nov 8, 2023
2 of 3 checks passed
@lchen-2101 lchen-2101 deleted the add-typer-cli branch November 8, 2023 14:55
jcadam14 pushed a commit that referenced this pull request May 3, 2024
### Replace `main.py` to [Typer](https://typer.tiangolo.com/) CLI app

The CLI setup in `main.py` is pretty primitive, and is really geared
towards _us_, developers on this project. I would like to create a
utility that outside technical users can use to test their files against
our validation logic. To get there without having to reinvent many
wheels, we need an actual CLI framework. That's where
[Typer](https://typer.tiangolo.com/) comes in.

Typer is a wrapper around
[Click](https://click.palletsprojects.com/en/8.1.x/), written by the
same dev as FastAPI, so it follows many of the same patterns as we're
used to there, such as leaning into Python's `typing` and
[Pydantic](https://docs.pydantic.dev/latest/) for example.


### Return validation results as Pandas `DataFrame` instead of vanilla
`dict`

I also wanted to support multiple output formats; at least json, csv,
and default to a human-readable table. And while we _could_ do that with
a `dict`, it's much easier to work with `DataFrame`s as they're already
tabular in nature and have a ton of transform functionality built-in.
So, I have also refactored the `validate` function to return a Panda's
`DataFrame` instead of a vanilla `dict`. This made it much easier to
output validation data in multiple formats.

Along with this, I did a fair bit of refactoring of that surrounding
code, breaking up the big monolith function into smaller testable
functions.

### `cfpb-val` script

To simply using the CLI, I've added a shortcut script so you don't have
to call the full path to the Python script. In it's current form, you
can now call just...

    cfpb-val --help

...instead of...

    python regtech_data_validator/cli.py --help 

#### Better script name?

I'm not in love with `cfpb-val`. I started off with `cfpb-comply`, but I
didn't love that either. I'm looking for something that's short, but
also very clear that it's CFPB-related, and having to do with this
project. `cfpb-rdv`?

### Lots of `README.md` love

I've reworked the `README.md` quite a bit, focusing on users coming to
this repo for the first time, and trying to get the CLI to work. This
includes pushing the Poetry-based install up to the top and simplifying
it, slimming and pushing down development-related setup, and adding
details about how to use the CLI itself.

See:
https://github.com/cfpb/regtech-data-validator/blob/add-typer-cli/README.md

### Refactor `lei` arg to more generic `context`

In attempt to future-proof this a bit, I've added a multi-value
`context` arg to the CLI, which allows you to pass arbitrary
`<key>=<value>` pairs, `lei` being the only key that's current used for
anything.

### Other tidbits I found along the way

- Removed deprecated `python.formatting.provider` VS Code configuration
- Removes no longer used `Makefile`
- Fixes several validations that had incorrect IDs, severities, or were
in the wrong phase

---------

Co-authored-by: lchen-2101 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants