Skip to content

Commit

Permalink
add more details
Browse files Browse the repository at this point in the history
  • Loading branch information
Aldrian Harjati committed Oct 4, 2023
1 parent 98da8ef commit c9361d0
Showing 1 changed file with 55 additions and 23 deletions.
78 changes: 55 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,21 @@

This is a RegTech submission data parser and validator which makes use of Pandera. You can read about Pandera schemas [here](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html).

## Pre-requisites
Poetry is used as the package management tool. Once installed, just running poetry install in the root of the project should install all the dependencies needed by the app.
Docker is used for local development where ancillary services will run.
(Optional) Visual Studio Code for local development.

## Dependencies
All packages and libraries used in this repository can be found in`pyproject.toml`

## Dev Container Setup

The code in this repository is developed and run inside of a dev container within Visual Studio Code. These instructions will not work if using an alternative editor such as Vim or Emacs. To build, run, and attach the container to VS Code you'll need to have Docker installed on your system, and the `Dev Containers` extension installed within VS Code.

Open this repository within VS Code and press `COMMAND + SHIFT + p` on your keyboard. This will open the command bar at the top of your window. Enter `Dev Containers: Rebuild and Reopen in Container`. VS Code will open a new window and you'll see a status message towards the bottom right of your screen that the container is building and attaching. This will take a few minutes the first time because Docker needs to build the container without a build cache. You may receive a notification that VS Code wants to perform a reload because some extensions could not load. Sometimes this happens because extensions are loaded in conflicting orders and dependencies are not satisfied.

## Running the Demo

If using VS Code, setup is completed by simply running the code within a Dev Container. If you're not making use of VS Code, make your life easier and use VS Code :sunglasses:. See the instructions above for setting up the Dev Container.

## Development
There are few files in `src/validator` that will be of interest.
- `checks.py` defines custom Pandera Check class called `SBLCheck`.
- `global_data.py` defines functions to parse NAICS and GEOIDs.
Expand All @@ -20,28 +25,11 @@ There are few files in `src/validator` that will be of interest.
- Lastly, the file `main.py` pulls everything together and illustrates how the schema can catch the various validation errors present in our mock, invalid dataset and different LEI values.

## Test data
- The repo includes tests that can be executed using `pytest`. These tests can be located under `src/tests`.
- The repo also includes 2 test datasets for manual testing, one with all valid data, and one where each line
represents a different failed validation, or different permutation of of the same
failed validation.
- The repo includes unit tests that can be executed using `pytest`. These tests can be located under `src/tests`.
- The repo also includes 2 test datasets for manual testing, one with all valid data, and one where each line represents a different failed validation, or different permutation of of the same failed validation.
- [`sbl-validations-pass.csv`](src/tests/data/sbl-validations-pass.csv)
- [`sbl-validations-fail.csv`](src/tests/data/sbl-validations-fail.csv)

### Manual Test
```sh
# Test validating the "good" file
# If passing lei value, pass lei as first arg and csv_path as second arg
python src/validator/main.py 000TESTFIUIDDONOTUSE src/tests/data/sbl-validations-pass.csv
# else just pass the csv_path as arg
python src/validator/main.py src/tests/data/sbl-validations-pass.csv

# Test validating the "bad" file
python src/validator/main.py 000TESTFIUIDDONOTUSE src/tests/data/sbl-validations-fail.csv
# or
python src/validator/main.py src/tests/data/sbl-validations-fail.csv
```


## Development

Development Process
Expand All @@ -68,8 +56,52 @@ Development standard practice
* Example: "denial_reasons. enum_value_conflict"
![Validation ID](validation_id.png)

## Running Validator
`main.py` allows user to test csv file with and without LEI number.

# Using VSCode
If using VS Code, validator can be executed by running `main.py` within a Dev Container. To run `main.py`, you can run these commands in VSCode terminal.
```sh
# Test validating the "good" file
# If passing lei value, pass lei as first arg and csv_path as second arg
python src/validator/main.py 000TESTFIUIDDONOTUSE src/tests/data/sbl-validations-pass.csv
# else just pass the csv_path as arg
python src/validator/main.py src/tests/data/sbl-validations-pass.csv

# Test validating the "bad" file
python src/validator/main.py 000TESTFIUIDDONOTUSE src/tests/data/sbl-validations-fail.csv
# or
python src/validator/main.py src/tests/data/sbl-validations-fail.csv
```

# Using terminal
If using terminal, you can utilize `poetry` to run the code. To run `main.py`, you can use these commands.
```sh
# Test validating the "good" file
# If passing lei value, pass lei as first arg and csv_path as second arg
poetry run python src/validator/main.py 000TESTFIUIDDONOTUSE src/tests/data/sbl-validations-pass.csv
# else just pass the csv_path as arg
poetry run python src/validator/main.py src/tests/data/sbl-validations-pass.csv

# Test validating the "bad" file
poetry run python src/validator/main.py 000TESTFIUIDDONOTUSE src/tests/data/sbl-validations-fail.csv
# or
poetry run python src/validator/main.py src/tests/data/sbl-validations-fail.csv
```


## Running Test
This repository is using `pytest`.

# Using VSCode
If using VS Code, tests can be completed within a Dev Container.

# Using terminal
To run `pytest`, you can use this command `poetry run pytest` in the root directory

## Coverage
[![Coverage badge](https://github.com/cfpb/regtech-data-validator/raw/python-coverage-comment-action-data/badge.svg)](https://github.com/cfpb/regtech-data-validator/tree/python-coverage-comment-action-data)
Complete coverage details can be found under [`python-coverage-comment-action-data` branch](https://github.com/cfpb/regtech-data-validator/tree/python-coverage-comment-action-data)

## Contributing
[CFPB](https://www.consumerfinance.gov/) is developing the `RegTech Data Validator` in the open to maximize transparency and encourage third party contributions. If you want to contribute, please read and abide by the terms of the [License](./LICENSE) for this project. Pull Requests are always welcome.
Expand Down

0 comments on commit c9361d0

Please sign in to comment.