-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: standardize repo structure and other prep for open-sourcing #60
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
722b981
Move files into more standard Python project layout
hkeeler d4529ec
Fix issues related to repo restructure
hkeeler ccee738
Move data-related code and config under `data` dir
hkeeler baeb814
Merge config.py and tools under data dir
hkeeler 9044db8
Add README for external data sources
hkeeler 8bc7ab3
Improve SBLCheck constructor args
hkeeler 58873bc
Fix multi-line string that was setup as a tuple
hkeeler b732419
Print CLI output as JSON instead of Python dict.
hkeeler 3b18289
black and ruff fixups
hkeeler 5a515e9
Fix path to `tests` dir in DevContainer setup
hkeeler 3901ca3
Remove `tools` dir from black exclude list
hkeeler a86f79e
Add `--verbose` to `black` Action to debug failures
hkeeler 3793bca
Will reverting Action's `black` version back help?
hkeeler File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# FFIEC's Census Flat File | ||
|
||
- https://www.ffiec.gov/censusapp.htm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# North American Industry Classification System (NAICS) codes | ||
|
||
- https://www.census.gov/naics/?48967 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
import csv | ||
import os | ||
import sys | ||
|
||
import pandas as pd | ||
|
||
|
||
# column header text containing naics code | ||
NAICS_CODE_COL = "2022 NAICS US Code" | ||
# column header text containing naics title/description | ||
NAICS_TITLE_COL = "2022 NAICS US Title" | ||
|
||
|
||
""" | ||
filter NAICS data with only 3 digit codes | ||
|
||
Raises: | ||
FileNotFoundError: when input excel file not existed | ||
FileExistsError: when output csv file existed | ||
""" | ||
if __name__ == "__main__": | ||
if len(sys.argv) != 3: | ||
print(f"Usage: {sys.argv[0]} <raw-src> <csv-dest>") | ||
exit(1) | ||
|
||
raw_src = sys.argv[1] | ||
csv_dest = sys.argv[2] | ||
|
||
if not os.path.isfile(raw_src): | ||
print(f"source file not existed: {raw_src}") | ||
exit(2) | ||
|
||
if os.path.isfile(csv_dest): | ||
print("destination file already existed: {csv_dest}") | ||
exit(3) | ||
|
||
df = pd.read_excel(raw_src, dtype=str, na_filter=False) | ||
|
||
print(f'source file successfully read: {raw_src}') | ||
|
||
# add header | ||
result = [["code", "title"]] | ||
|
||
# read excel file | ||
# and create csv data list | ||
for index, row in df.iterrows(): | ||
code = str(row[NAICS_CODE_COL]) | ||
if len(code) == 3: | ||
a_row = [code, str(row[NAICS_TITLE_COL])] | ||
result.append(a_row) | ||
|
||
# output data to csv file | ||
with open(csv_dest, "w") as f: | ||
writer = csv.writer(f) | ||
writer.writerows(result) | ||
|
||
print(f'destination file successfully written: {csv_dest}') |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
""" | ||
Subclasses of Pandera's `Check` class | ||
""" | ||
|
||
from enum import StrEnum | ||
from typing import Any, Callable, Type | ||
|
||
from pandera import Check | ||
from pandera.backends.base import BaseCheckBackend | ||
from pandera.backends.pandas.checks import PandasCheckBackend | ||
|
||
|
||
class Severity(StrEnum): | ||
ERROR = 'error' | ||
WARNING = 'warning' | ||
|
||
|
||
class SBLCheck(Check): | ||
""" | ||
A Pandera.Check subclasss that requires a `name` and an `id` be | ||
specified. Additionally, an attribute named `warning` is added to | ||
the class to enable distinction between warnings and errors. The | ||
default value of warning is `False` which corresponds to an error. | ||
Don't use this class directly. Make use of the SBLErrorCheck and | ||
SBLWarningCheck subclasses below. | ||
""" | ||
|
||
def __init__(self, check_fn: Callable, id: str, name: str, description: str, severity: Severity, **check_kwargs): | ||
""" | ||
Subclass of Pandera's `Check`, with special handling for severity level | ||
Args: | ||
check_fn (Callable): A function which evaluates the validity of the column(s) being tested. | ||
id (str, required): Unique identifier for a check | ||
name (str, required): Unique name for a check | ||
description (str, required): Long-form description of a check | ||
severity (Severity, required): The severity of a check (error or warning) | ||
check_kwargs (Any, optional): Parameters passed to `check_fn` function | ||
""" | ||
|
||
self.severity = severity | ||
|
||
super().__init__(check_fn, title=id, name=name, description=description, **check_kwargs) | ||
|
||
@classmethod | ||
def get_backend(cls, check_obj: Any) -> Type[BaseCheckBackend]: | ||
"""Assume Pandas DataFrame and return PandasCheckBackend""" | ||
return PandasCheckBackend |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used? I may have missed the usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's required by the NAICS code processing script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add that detail to the new README I added for that dataset. Each of those could use instructions on how to run those two scripts too.