Skip to content

Commit

Permalink
65 create test to make sure validations stay in sync with 2024 valida…
Browse files Browse the repository at this point in the history
…tionscsv (#69)

This adds a pytest (test_csv_differences.py) to validate our python code
against the CSV located at
https://raw.githubusercontent.com/cfpb/sbl-content/main/fig-files/validation-spec/2024-validations.csv

This will compare error/warning codes (making sure neither the code nor
csv have codes the other doesn't), the type (error or warning) and the
description.

Special note is taken of E2014 and E2015 due to formatting in the CSV.
In the near future when the frontend is ready to start displaying
error/warning descriptions, discussions will be had to figure out how we
want to display the more complicated descriptions and what sort of
formatting the backend should have. Right now, we preserve as much of
the formatting as we can but the pytest will also strip all of this off
for these two errors (or any others added to the remove_formatting list)
and compare just character data. In general, we do NOT want to do this
because several strings in our python code were missing spaces and other
standard grammatical formatting, and stripping that off would have
caused the test to improperly accept that description.

This story is being worked in conjunction with #68 which is being used
to update the phase_validations.py for other discrepancies found during
testing. It is being routinely merged into this branch to properly run
the pytest.
  • Loading branch information
jcadam14 authored Jan 2, 2024
1 parent d9d1fd3 commit 4311e8c
Show file tree
Hide file tree
Showing 7 changed files with 87,401 additions and 10 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,5 @@ coverage.xml

# excel artifact
~$example_sblar.xlsx

errors.csv
25 changes: 24 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,6 @@ failed validation.
We use these test files in for automated test, but can also be passed in via the
`cfpb-val` CLI utility for manual testing.


## Development

### Best practices
Expand Down Expand Up @@ -234,6 +233,30 @@ Test coverage details can be found in this project's
branch.


### Testing the FIG CSV

A standard pytest ([`test_csv_differences.py`](tests/test_csv_differences.py)) has been written that compares the validation code in [`phase_validations.py`](regtech_data_validator/phase_validations.py)
to the [`FIG CSV`](https://github.com/cfpb/sbl-content/blob/main/fig-files/validation-spec/2024-validations.csv). This test will check that
the list of validation IDs in one match the other, and will report on IDs that are missing in either.
The test will also validate that all severities (error or warning) match. The test will then
do a hard string compare between the violation descriptions, with a couple of caveats:
- Any python validation check whose description starts with a single quote will first add the single quote
to the CSV's description, if one doesn't exist. This is done because if someone modifies the CSV in Excel,
Excel will drop the beginning single quote, which it interprets as a formatter telling Excel "this field is a string"
- Certain descriptions in the CSV have 'complex' formatting to produce layouts with lists, new lines and white space
that may not compare correctly. Since how error descriptions will be formatted on the results page for a submission,
currently the test will strip off some of this formatting and compare the text.

This test is ran automatically as part of our unit testing pipeline. A developer can also
run the test manually by running the command `poetry run pytest tests/test_csv_differences.py`

This will create an errors.csv file at the root of the repo that can be used to easily view
differences found between the two files.

Normally the pytest will point to the main branch in the sbl-content repo, but a developer
can modify the test to point to a development branch that has upcoming changes, run the test with the above command,
and then evaluate what changes may need to be made to the python validation code.

## Linting

This repository utilizing `black` and `ruff` libraries to check and fix any
Expand Down
Binary file added data/census/raw/CensusFlatFile2023.zip
Binary file not shown.
Loading

0 comments on commit 4311e8c

Please sign in to comment.