Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

28-Ability to run each stage independently #29

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

jigglepuff
Copy link

@jigglepuff jigglepuff commented Apr 14, 2020

Closes: #28

New files:
tests/test_fetcher.py - Fetcher unit tests, and entry point to run Fetch stage by itself.

tests/test_parser.py - Parser unit tests, and entry point to run Parse stage by itself.

tests/test_extractor.py - Extractor unit tests, and entry point to run Extract stage by itself.

tests/test_transformer.py - Transformer unit tests, and entry point to run Transform stage by itself.

tests/test_loader.py - Loader unit tests, and entry point to run Load stage by itself.

Changes:
app.py - change from etl import fetcher to from etl.fetcher import Fetcher to avoid module name and variable name collision. This allows us to copy code from app.py to test fixtures.

etl/command_line_args.py - added function arguments to indicate which arguments argparse should expect. This allows test fixtures to reuse this module.

etl/fetcher.py - removed imports that are not used. This also avoids import errors when running from directories other than the project root directory.

etl/utils.py - added custom to_csv() and read_csv() to preserve datatype when exporting and importing CSVs between stages extractor and transformer.

requirements.txt - added dependencies for pytest (python testing framework)

README.md - added instructions to run standalone stages and unit tests

.gitignore - added test yaml files: data/sources/test_sources.yml and data/transform_tasks/test_transform_tasks.yml. We do not need to track these files, since they are for developers to test and customize (for now).

@jigglepuff jigglepuff marked this pull request as ready for review April 22, 2020 02:35
Copy link
Collaborator

@mrpetrocket mrpetrocket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started reviewing. Haven't finished yet; ran into an error in the extractor. The error happens on master too so I don't think it's from this PR.

@@ -148,3 +148,7 @@ transform_vacant_table.csv

# sqlite test db
vacancy.sqlite

# config files for unit tests
data/source/test_sources.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the database config files seem to be named config_{environment}.yml but these files are named {env}_sources.yml or similar. I would say we should put the environment name in a consistent place in the filename, either front or back.


4. Run `Transformer` only:
```bash
python3 tests/test_transformer.py --local-sources src/BldgCom.csv src/BldgRes.csv src/par.dbf.csv src/prcl.shp.csv src/Prcl.csv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do those csv files come from running one of the other stages in isolation? If so we should document that here.

etl/utils.py Show resolved Hide resolved
tests/test_extractor.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to run each stage independently
2 participants