Skip to content

Commit

Permalink
clean up markup warnings
Browse files Browse the repository at this point in the history
  • Loading branch information
Aldrian Harjati committed Oct 4, 2023
1 parent c9361d0 commit 4fff2d1
Showing 1 changed file with 31 additions and 23 deletions.
54 changes: 31 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,13 @@
This is a RegTech submission data parser and validator which makes use of Pandera. You can read about Pandera schemas [here](https://pandera.readthedocs.io/en/stable/dataframe_schemas.html).

## Pre-requisites

Poetry is used as the package management tool. Once installed, just running poetry install in the root of the project should install all the dependencies needed by the app.
Docker is used for local development where ancillary services will run.
(Optional) Visual Studio Code for local development.

## Dependencies

All packages and libraries used in this repository can be found in`pyproject.toml`

## Dev Container Setup
Expand All @@ -17,20 +19,23 @@ The code in this repository is developed and run inside of a dev container withi
Open this repository within VS Code and press `COMMAND + SHIFT + p` on your keyboard. This will open the command bar at the top of your window. Enter `Dev Containers: Rebuild and Reopen in Container`. VS Code will open a new window and you'll see a status message towards the bottom right of your screen that the container is building and attaching. This will take a few minutes the first time because Docker needs to build the container without a build cache. You may receive a notification that VS Code wants to perform a reload because some extensions could not load. Sometimes this happens because extensions are loaded in conflicting orders and dependencies are not satisfied.

## Development
There are few files in `src/validator` that will be of interest.
- `checks.py` defines custom Pandera Check class called `SBLCheck`.

There are few files in `src/validator` that will be of interest.

- `checks.py` defines custom Pandera Check class called `SBLCheck`.
- `global_data.py` defines functions to parse NAICS and GEOIDs.
- `phase_validations.py` defines phase 1 and phase 2 Pandera schema/checks used for validating the SBLAR data.
- `check_functions.py` contains a collection of functions to be run against the data that are a bit too complex to be implemented directly within the schema as Lambda functions.
- Lastly, the file `main.py` pulls everything together and illustrates how the schema can catch the various validation errors present in our mock, invalid dataset and different LEI values.

## Test data

- The repo includes unit tests that can be executed using `pytest`. These tests can be located under `src/tests`.
- The repo also includes 2 test datasets for manual testing, one with all valid data, and one where each line represents a different failed validation, or different permutation of of the same failed validation.
- [`sbl-validations-pass.csv`](src/tests/data/sbl-validations-pass.csv)
- [`sbl-validations-fail.csv`](src/tests/data/sbl-validations-fail.csv)

## Development
## Development Process and Standard

Development Process
Below are the steps the development team follows to fix issues, develop new features, etc.
Expand All @@ -43,24 +48,26 @@ Below are the steps the development team follows to fix issues, develop new feat
6. The PR is merged only by a separate member in the dev team

Development standard practice
* Check functions should focus on reuse.
* Most of the validations share logic with other validations.
* Avoid using lambdas for Check functions.
* They do not promote reuse.
* They are harder to debug.
* They are harder to test.
* Check function signatures should reflect the functionality.
* Check functions should have corresponding unit tests.
* [Unit Test](./src/tests/test_check_functions.py)
* Check definitions' name should be set to validation ID.
* Example: "denial_reasons. enum_value_conflict"

- Check functions should focus on reuse.
- Most of the validations share logic with other validations.
- Avoid using lambdas for Check functions.
- They do not promote reuse.
- They are harder to debug.
- They are harder to test.
- Check function signatures should reflect the functionality.
- Check functions should have corresponding unit tests.
- [Unit Test](./src/tests/test_check_functions.py)
- Check definitions' name should be set to validation ID.
- Example: "denial_reasons. enum_value_conflict"
![Validation ID](validation_id.png)

## Running Validator
`main.py` allows user to test csv file with and without LEI number.

# Using VSCode
`main.py` allows user to test csv file with and without LEI number.

If using VS Code, validator can be executed by running `main.py` within a Dev Container. To run `main.py`, you can run these commands in VSCode terminal.

```sh
# Test validating the "good" file
# If passing lei value, pass lei as first arg and csv_path as second arg
Expand All @@ -74,8 +81,8 @@ python src/validator/main.py 000TESTFIUIDDONOTUSE src/tests/data/sbl-validations
python src/validator/main.py src/tests/data/sbl-validations-fail.csv
```

# Using terminal
If using terminal, you can utilize `poetry` to run the code. To run `main.py`, you can use these commands.

```sh
# Test validating the "good" file
# If passing lei value, pass lei as first arg and csv_path as second arg
Expand All @@ -89,24 +96,25 @@ poetry run python src/validator/main.py 000TESTFIUIDDONOTUSE src/tests/data/sbl-
poetry run python src/validator/main.py src/tests/data/sbl-validations-fail.csv
```


## Running Test
This repository is using `pytest`.

# Using VSCode
If using VS Code, tests can be completed within a Dev Container.
This repository is using `pytest`. If using VS Code, tests can be completed within a Dev Container. If using local terminal or console, you can use this command `poetry run pytest` in the root directory

# Using terminal
To run `pytest`, you can use this command `poetry run pytest` in the root directory
```sh
poetry run pytest
```

## Coverage

[![Coverage badge](https://github.com/cfpb/regtech-data-validator/raw/python-coverage-comment-action-data/badge.svg)](https://github.com/cfpb/regtech-data-validator/tree/python-coverage-comment-action-data)
Complete coverage details can be found under [`python-coverage-comment-action-data` branch](https://github.com/cfpb/regtech-data-validator/tree/python-coverage-comment-action-data)

## Contributing

[CFPB](https://www.consumerfinance.gov/) is developing the `RegTech Data Validator` in the open to maximize transparency and encourage third party contributions. If you want to contribute, please read and abide by the terms of the [License](./LICENSE) for this project. Pull Requests are always welcome.

## Open source licensing info

1. [TERMS](./TERMS.md)
1. [LICENSE](./LICENSE)
1. [CFPB Source Code Policy](https://github.com/cfpb/source-code-policy/)

0 comments on commit 4fff2d1

Please sign in to comment.