-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
28-Ability to run each stage independently #29
base: master
Are you sure you want to change the base?
Conversation
…to 24-progress_bar
…s to help import and export csv preserving datatype metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started reviewing. Haven't finished yet; ran into an error in the extractor. The error happens on master too so I don't think it's from this PR.
@@ -148,3 +148,7 @@ transform_vacant_table.csv | |||
|
|||
# sqlite test db | |||
vacancy.sqlite | |||
|
|||
# config files for unit tests | |||
data/source/test_sources.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the database config files seem to be named config_{environment}.yml but these files are named {env}_sources.yml or similar. I would say we should put the environment name in a consistent place in the filename, either front or back.
|
||
4. Run `Transformer` only: | ||
```bash | ||
python3 tests/test_transformer.py --local-sources src/BldgCom.csv src/BldgRes.csv src/par.dbf.csv src/prcl.shp.csv src/Prcl.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do those csv files come from running one of the other stages in isolation? If so we should document that here.
Closes: #28
New files:
tests/test_fetcher.py
-Fetcher
unit tests, and entry point to runFetch
stage by itself.tests/test_parser.py
-Parser
unit tests, and entry point to runParse
stage by itself.tests/test_extractor.py
-Extractor
unit tests, and entry point to runExtract
stage by itself.tests/test_transformer.py
-Transformer
unit tests, and entry point to runTransform
stage by itself.tests/test_loader.py
-Loader
unit tests, and entry point to runLoad
stage by itself.Changes:
app.py
- changefrom etl import fetcher
tofrom etl.fetcher import Fetcher
to avoid module name and variable name collision. This allows us to copy code fromapp.py
to test fixtures.etl/command_line_args.py
- added function arguments to indicate which arguments argparse should expect. This allows test fixtures to reuse this module.etl/fetcher.py
- removed imports that are not used. This also avoids import errors when running from directories other than the project root directory.etl/utils.py
- added customto_csv()
andread_csv()
to preserve datatype when exporting and importing CSVs between stagesextractor
andtransformer
.requirements.txt
- added dependencies forpytest
(python testing framework)README.md
- added instructions to run standalone stages and unit tests.gitignore
- added test yaml files:data/sources/test_sources.yml
anddata/transform_tasks/test_transform_tasks.yml
. We do not need to track these files, since they are for developers to test and customize (for now).