Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

65 create test to make sure validations stay in sync with 2024 validationscsv #69

Conversation

jcadam14
Copy link
Contributor

@jcadam14 jcadam14 commented Dec 5, 2023

This adds a pytest (test_csv_differences.py) to validate our python code against the CSV located at https://raw.githubusercontent.com/cfpb/sbl-content/main/fig-files/validation-spec/2024-validations.csv

This will compare error/warning codes (making sure neither the code nor csv have codes the other doesn't), the type (error or warning) and the description.

Special note is taken of E2014 and E2015 due to formatting in the CSV. In the near future when the frontend is ready to start displaying error/warning descriptions, discussions will be had to figure out how we want to display the more complicated descriptions and what sort of formatting the backend should have. Right now, we preserve as much of the formatting as we can but the pytest will also strip all of this off for these two errors (or any others added to the remove_formatting list) and compare just character data. In general, we do NOT want to do this because several strings in our python code were missing spaces and other standard grammatical formatting, and stripping that off would have caused the test to improperly accept that description.

This story is being worked in conjunction with #68 which is being used to update the phase_validations.py for other discrepancies found during testing. It is being routinely merged into this branch to properly run the pytest.

Several changes and discrepancies existed between the python code and CSV.
This corrects those issues.
…he python code

and the csv located at https://raw.githubusercontent.com/cfpb/sbl-content/main/fig-files/validation-spec/2024-validations.csv

There was no automated check to verify that the code and csv were in sync.  Because changes to the csv had been made
but not flowed into the code, several differences were found.  This will make it easier to identify changes and implement them
in the code or correct them in the csv.

https://raw.githubusercontent.com/cfpb/sbl-content/main/fig-files/validation-spec/2024-validations.csv
Black linter didn't pass
Realized I could simplify my list of list comprehension

Better linting
Easier readability
To pass the Ruff linter
…e-test-to-make-sure-validations-stay-in-sync-with-2024-validationscsv
…2014 and E2015

so that just the strings can be compared (without spaces, new lines, bullets, etc)

Because the CSV has formatting for E2014 and E2015 that is more complicated than
just a couple of sentences, I added in the pytest to remove all formatting for those
error codes so that just character data is compared.
(new line characters, 'bullets', etc)

Just to keep as close to the csv as we can, even though eventually this formatting
will change once discussions on error formatting occur
Updated to pass linters
To keep consistent with the CSV. This structure is stripped away
though during the automated testing in #65
so the pytest passes and the code and csv are in sync
…e-test-to-make-sure-validations-stay-in-sync-with-2024-validationscsv
…e-test-to-make-sure-validations-stay-in-sync-with-2024-validationscsv
To pass the linting
Copy link

github-actions bot commented Dec 6, 2023

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  regtech_data_validator
  global_data.py
  phase_validations.py
Project Total  

This report was generated by python-coverage-comment-action

@jcadam14 jcadam14 self-assigned this Dec 21, 2023
@jcadam14
Copy link
Contributor Author

jcadam14 commented Dec 21, 2023

For some reason the black linting doesn't format in VSCode, and branch 68, which has the same file, doesn't fail on the lines this linter does

Added the errors.csv spit out by the unit test to .gitignore
@lchen-2101
Copy link
Collaborator

maybe try running the linter outside vs code? I've found vs code formatting doesn't always conform with the cli setup. Try poetry run black . in the code's root directory

def test_csv_differences(self):
vals = get_phase_1_and_2_validations_for_lei()
code_descs = [
[s.title, s.severity, s.description]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't make too big of a difference for only 3 fields, but for better readability I would rather this be a dictionary; so later on it becomes c["title"] instead of c[0]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... nvm... the csv matching from github...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually... since the csv has headers, if we can, let's try using heading for that as well if possible; so match with keys, instead of indices; gives more context


with open("errors.csv", "w") as error_file:
for c in code_descs:
found_cd = [d for d in csv_descs if d[0] == c[0]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe next((d for d in csv_descs if d[0] == c[0]), None), so you won't have to do x[0][0] all the time

with open("errors.csv", "w") as error_file:
for c in code_descs:
found_cd = [d for d in csv_descs if d[0] == c[0]]
if c[0] in self.remove_formatting_codes and len(found_cd) != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like quite a bit of len(found_cd) checks; shortcut:

if found_cd := next((d for d in csv_descs if d[0] == c[0]), None):
    // do all the things without needing to check found_cd again

…ow the comparisons are being done.

This uses the dataframe headers from the csv, and creates objects from the phase_validations.py json to compare
using column/field names instead of indices.  Makes it much more readable
Copy link
Collaborator

@lchen-2101 lchen-2101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lchen-2101
Copy link
Collaborator

oh yeah, please run linters before merging

@jcadam14 jcadam14 merged commit 4311e8c into main Jan 2, 2024
3 checks passed
jcadam14 added a commit that referenced this pull request May 3, 2024
…tionscsv (#69)

This adds a pytest (test_csv_differences.py) to validate our python code
against the CSV located at
https://raw.githubusercontent.com/cfpb/sbl-content/main/fig-files/validation-spec/2024-validations.csv

This will compare error/warning codes (making sure neither the code nor
csv have codes the other doesn't), the type (error or warning) and the
description.

Special note is taken of E2014 and E2015 due to formatting in the CSV.
In the near future when the frontend is ready to start displaying
error/warning descriptions, discussions will be had to figure out how we
want to display the more complicated descriptions and what sort of
formatting the backend should have. Right now, we preserve as much of
the formatting as we can but the pytest will also strip all of this off
for these two errors (or any others added to the remove_formatting list)
and compare just character data. In general, we do NOT want to do this
because several strings in our python code were missing spaces and other
standard grammatical formatting, and stripping that off would have
caused the test to improperly accept that description.

This story is being worked in conjunction with #68 which is being used
to update the phase_validations.py for other discrepancies found during
testing. It is being routinely merged into this branch to properly run
the pytest.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create test to make sure validations stay in sync with 2024-validations.csv
2 participants