-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor: standardize repo structure and other prep for open-sourcing (…
…#60) Grab bag of tune-up in prep for open-sourcing this repo. 1. Restructure repo to be more compliant with modern Python projects. 1. Move `tests` out to top-level directory. 2. Rename `src/validator` to `regtech_data_validator`. 2. Consolidate external datasource code and data to `data` dir. 1. Move `config.py` settings into their respective scripts, and file paths are now passed in as CLI args instead. 3. Move processed CSV files into the project itself. This allows for simpler data lookups via package name via `importlib.resources`. This allowed the removal of the `ROOT_PATH` Python path logic in all of the `__init__.py`s. 4. Refactor `global_data.py` to load data only once where module is first imported. 5. Refactor `SBLCheck`'s 1. `warning: bool` for a more explicit `severity`, backed by an enum that only allows `ERROR` and `WARNING`. 1. Several of the warning-level validations were not setting `warning=True`, and were thus defaulting to `False`. This will prevent that. I also fixed all these instances. 2. Removes the need for translation to `severity` when building JSON output. 2. Use explicit args in the constructor, and pass all shared args on to parent class. This removes the need for the arg `name`/`id` error handling. 6. Switch CLI output from Python dict to JSON. 7. Rollback `black` version used in linting Action due to bug in latest version. - psf/black#3953 **Note:** Some of the files that I both moved _and_ changed seem to now show as having deleted the old file and created a new one. I'm not sure why it's doing this. I did the moves and changes in separate commits, which usually prevents this, but doesn't seem to be the case here. Perhaps there's just so much change in some that git considers it a whole new file? 🤷 It's kind of annoying, especially if it results in losing git history for those files.
- Loading branch information
Showing
35 changed files
with
469 additions
and
382 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# FFIEC's Census Flat File | ||
|
||
- https://www.ffiec.gov/censusapp.htm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# North American Industry Classification System (NAICS) codes | ||
|
||
- https://www.census.gov/naics/?48967 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
import csv | ||
import os | ||
import sys | ||
|
||
import pandas as pd | ||
|
||
|
||
# column header text containing naics code | ||
NAICS_CODE_COL = "2022 NAICS US Code" | ||
# column header text containing naics title/description | ||
NAICS_TITLE_COL = "2022 NAICS US Title" | ||
|
||
|
||
""" | ||
filter NAICS data with only 3 digit codes | ||
Raises: | ||
FileNotFoundError: when input excel file not existed | ||
FileExistsError: when output csv file existed | ||
""" | ||
if __name__ == "__main__": | ||
if len(sys.argv) != 3: | ||
print(f"Usage: {sys.argv[0]} <raw-src> <csv-dest>") | ||
exit(1) | ||
|
||
raw_src = sys.argv[1] | ||
csv_dest = sys.argv[2] | ||
|
||
if not os.path.isfile(raw_src): | ||
print(f"source file not existed: {raw_src}") | ||
exit(2) | ||
|
||
if os.path.isfile(csv_dest): | ||
print("destination file already existed: {csv_dest}") | ||
exit(3) | ||
|
||
df = pd.read_excel(raw_src, dtype=str, na_filter=False) | ||
|
||
print(f'source file successfully read: {raw_src}') | ||
|
||
# add header | ||
result = [["code", "title"]] | ||
|
||
# read excel file | ||
# and create csv data list | ||
for index, row in df.iterrows(): | ||
code = str(row[NAICS_CODE_COL]) | ||
if len(code) == 3: | ||
a_row = [code, str(row[NAICS_TITLE_COL])] | ||
result.append(a_row) | ||
|
||
# output data to csv file | ||
with open(csv_dest, "w") as f: | ||
writer = csv.writer(f) | ||
writer.writerows(result) | ||
|
||
print(f'destination file successfully written: {csv_dest}') |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
""" | ||
Subclasses of Pandera's `Check` class | ||
""" | ||
|
||
from enum import StrEnum | ||
from typing import Any, Callable, Type | ||
|
||
from pandera import Check | ||
from pandera.backends.base import BaseCheckBackend | ||
from pandera.backends.pandas.checks import PandasCheckBackend | ||
|
||
|
||
class Severity(StrEnum): | ||
ERROR = 'error' | ||
WARNING = 'warning' | ||
|
||
|
||
class SBLCheck(Check): | ||
""" | ||
A Pandera.Check subclasss that requires a `name` and an `id` be | ||
specified. Additionally, an attribute named `warning` is added to | ||
the class to enable distinction between warnings and errors. The | ||
default value of warning is `False` which corresponds to an error. | ||
Don't use this class directly. Make use of the SBLErrorCheck and | ||
SBLWarningCheck subclasses below. | ||
""" | ||
|
||
def __init__(self, check_fn: Callable, id: str, name: str, description: str, severity: Severity, **check_kwargs): | ||
""" | ||
Subclass of Pandera's `Check`, with special handling for severity level | ||
Args: | ||
check_fn (Callable): A function which evaluates the validity of the column(s) being tested. | ||
id (str, required): Unique identifier for a check | ||
name (str, required): Unique name for a check | ||
description (str, required): Long-form description of a check | ||
severity (Severity, required): The severity of a check (error or warning) | ||
check_kwargs (Any, optional): Parameters passed to `check_fn` function | ||
""" | ||
|
||
self.severity = severity | ||
|
||
super().__init__(check_fn, title=id, name=name, description=description, **check_kwargs) | ||
|
||
@classmethod | ||
def get_backend(cls, check_obj: Any) -> Type[BaseCheckBackend]: | ||
"""Assume Pandas DataFrame and return PandasCheckBackend""" | ||
return PandasCheckBackend |
Oops, something went wrong.