Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Begin warning people about spaces in model names #9886

Merged
merged 20 commits into from
Apr 12, 2024

Conversation

QMalcolm
Copy link
Contributor

resolves #9397

Problem

We don't support models with spaces in their names. However, we haven't been actually enforcing this. If a person have spaces in their model names, it causes issues when using selectors. Additionally, depending on a person's operating system, there were other edge case problems with spaces in model names.

Solution

Begin warning people about spaces in their model names. Depending on some flags, we give more or less information. If debug mode is not on, then we log out the first offending model name and a count of how many offending model names their are in total. If debug mode is on, then we log out every offending model name. Additionally, by default currently these logs are warnings and won't stop dbt from running. However, if one sets allow_spaces_in_model_names to False in their dbt_project.yml, then the logs become errors and found offending model names will stop dbt from running.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@QMalcolm QMalcolm requested a review from a team as a code owner April 10, 2024 06:48
@cla-bot cla-bot bot added the cla:yes label Apr 10, 2024
Copy link

codecov bot commented Apr 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.13%. Comparing base (95581cc) to head (9cdecaa).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9886      +/-   ##
==========================================
- Coverage   88.14%   88.13%   -0.02%     
==========================================
  Files         178      178              
  Lines       22459    22507      +48     
==========================================
+ Hits        19797    19837      +40     
- Misses       2662     2670       +8     
Flag Coverage Δ
integration 85.57% <100.00%> (-0.02%) ⬇️
unit 61.82% <73.17%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

core/dbt/events/types.py Outdated Show resolved Hide resolved
core/dbt/parser/manifest.py Outdated Show resolved Hide resolved
core/dbt/parser/manifest.py Outdated Show resolved Hide resolved
.changes/unreleased/Features-20240409-233347.yaml Outdated Show resolved Hide resolved
@QMalcolm QMalcolm force-pushed the qmalcolm--9397-deprecate-spaces-in-model-names branch from 41e5b5d to 7b60ab0 Compare April 10, 2024 19:26
@QMalcolm
Copy link
Contributor Author

Interesting the failing unit tests are also failing locally, but are failing differently 🤔 In the github action they are:

FAILED tests/unit/test_graph.py::GraphTest::test__dependency_list - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__model_incremental - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'.
FAILED tests/unit/test_graph.py::GraphTest::test__model_materializations - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__partial_parse - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__single_model - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__two_models_package_ref - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'
FAILED tests/unit/test_graph.py::GraphTest::test__two_models_simple_ref - AttributeError: 'Obj' object has no attribute 'ALLOW_SPACES_IN_MODEL_NAMES'

But locally the errors are:

FAILED tests/unit/test_graph.py::GraphTest::test__dependency_list - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__model_incremental - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__model_materializations - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__partial_parse - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__single_model - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__two_models_package_ref - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'
FAILED tests/unit/test_graph.py::GraphTest::test__two_models_simple_ref - AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'

They're failing on different flag attributes being accessed, and in different parts of the codebase... weird

@QMalcolm
Copy link
Contributor Author

I just checked out main and am still running into the AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS' error in those tests. So something with my local environment is screwy and not getting to the error we're seeing in the GHA. Gonna nuke my tox cache locally and see if that fixes things.

@QMalcolm
Copy link
Contributor Author

After busting my local tox cache, running the tests via make test they all pass. However if I use pytest to directly test the failing graph tests (pytest tests/unit/test_graph.py) they fail with the AttributeError: 'Namespace' object has no attribute 'SEND_ANONYMOUS_USAGE_STATS'.

@QMalcolm QMalcolm force-pushed the qmalcolm--9397-deprecate-spaces-in-model-names branch from 678462d to 958d5ba Compare April 10, 2024 23:53
@QMalcolm QMalcolm requested a review from emmyoop April 11, 2024 00:23
@QMalcolm
Copy link
Contributor Author

Rebased off of main and tracked down the issue 🙂

@QMalcolm
Copy link
Contributor Author

We had a fix on main for the failing integration tests in test_pp_vars.py. Rebasing to bring in those changes.

QMalcolm and others added 13 commits April 11, 2024 15:34
For projects with a lot of models that have spaces in their names, the
warning about this deprecation would be incredibly annoying. Now we instead
only log the first model name issue and then a count of how many models
have the issue, unless `--debug` is specified.
We want to be able to catch more than just `SpacesInModelNameDeprecation`
events, and in the next commit we will alter our tests to do so. Thus
instead of writing a new catcher for each event type, a slight modification
to the existing `EventCatcher` makes this much easier.
…config

Previously in our `test_graph.py` unit tests we were setting the flags global,
but not actually adding them to the config. The config, without the flags,
would then get passed to the manifest loader and other things. Those down stream
classes/functions would then look to the config for the flags and not find them.
This change ensures that the flags get applied to the config that is being used
in down stream operations during our unit tests.
@QMalcolm QMalcolm force-pushed the qmalcolm--9397-deprecate-spaces-in-model-names branch 4 times, most recently from 18fb56c to 9865f4a Compare April 12, 2024 04:11
Using `Note` events was causing test flakiness when run in a multi
worker environment using `pytest -nauto`. This is because the event
manager is currently a global. So in a situation where test `A` starts
and test `tests_debug_when_spaces_in_name` starts shortly there after,
the event manager for both tests will have the callbacks set in
`tests_debug_when_spaces_in_name`. Then if something in test `A` fired
a `Note` event, this would affect the count of `Note` events that
`tests_debug_when_spaces_in_name` sees, causing assertion failures. By
creating a custom event, `TotalModelNamesWithSpacesDeprecation`, we limit
the possible flakiness to only tests that fire the custom event. Thus
we didn't _eliminate_ all possibility of flakiness, but realistically
only the tests in `test_check_for_spaces_in_model_names.py` can now
interfere with each other. Which still isn't great, but to fully
resolve the problem we need to work on how the event manager is
handled (preferably not globally).
Previously we only logged out the count of how many invalid model names
there were if there was two or more invalid names (and not in debug mode).
However this message is important if there is even one invalid model
name and regardless of whether you are running debug mode. That is because
automated tools might be looking for the event type to track if anything
is wrong.

A related change in this commit is that we now only output the debug hint
if it wasn't run with debug mode. The idea being that if they are already
running it in debug mode, the hint could come accross as somewhat
patronizing.
We want people running dbt to be able to at a glance see warnings/errors
with running their project. In this case we are focused specifically on
errors/warnings in regards to model names containing spaces. Previously
we were only ever emitting the `warning_tag` in the message even if the
event itself was being emitted at an `ERROR` level. We now properly have
`[ERROR]` or `[WARNING]` in the message depending on the level. Unfortunately
we couldn't just look what level the event was being fired at, because that
information doesn't exist on the event itself.

Additionally, we're using events that base off of `DynamicEvents` which
unfortunately hard coded to `DEBUG`. Changing this would involve still
having a `level` property on the definition in `core_types.proto` and
then having `DynamicEvent`s look to `self.level` in the `level_tag`
method. Then we could change how firing events works based on the an
event's `level_tag` return value. This all sounds like a bit of tech
debt suited for PR, possibly multiple, and thus is not being done here.
@QMalcolm QMalcolm force-pushed the qmalcolm--9397-deprecate-spaces-in-model-names branch from 9865f4a to c678cbc Compare April 12, 2024 04:31
Comment on lines +14 to +16
# TODO Move this to dbt_common.ui
def _error_tag(msg: str) -> str:
return f'[{red("ERROR")}]: {msg}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to do this TODO before merging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not. That would entail opening a a PR in dbt-common and also getting it released, and I didn't want that to block this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



@dataclass
class EventCatcher:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

core/dbt/events/types.py Outdated Show resolved Hide resolved
Copy link
Member

@emmyoop emmyoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-3564] for deprecation - we should not support spaces in model names
2 participants